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Preface 


The  purpose  of  this  work  is  to  provide  an  introduction  to  the  mathe¬ 
matical  theory  of  multi-stage  decision  processes.  Since  these  constitute 
a  somewhat  formidable  set  of  terms  we  have  coined  the  term  "dynamic 
programming”  to  describe  the  subject  matter.  Actually,  as  we  shall  see, 
the  distinction  involves  more  than  nomenclature.  Rather,  it  involves  a 
certain  conceptual  framework  which  furnishes  us  a  new  and  versatile 
mathematical  tool  for  the  treatment  of  many  novel  and  interesting 
problems  both  in  this  new  discipline  and  in  various  parts  of  classical 
analysis.  Before  expanding  upon  this  theme,  let  us  present  a  brief 
discussion  of  what  is  meant  by  a  multi-stage  decision  process. 

Let  us  suppose  that  we  have  a  physical  system  S  whose  state  at  any 
time  t  is  specified  by  a  vector  />.  If  we  are  in  an  optimistic  frame  of  mind 
we  can  visualize  the  components  of  p  to  be  quite  definite  quantities  such 
as  Cartesian  coordinates,  or  position  and  momentum  coordinates,  or 
perhaps  volume  and  temperature,  or  if  we  are  considering  an  economic 
system,  supply  and  demand,  or  stockpiles  and  production  capacities.  If 
our  mood  is  pessimistic,  the  components  of  />  may  be  supposed  to  be 
probability  distributions  for  such  quantities  as  position  and  momentum, 
or  perhaps  moments  of  a  distribution. 

In  the  course  of  time,  this  system  is  subject  to  changes  of  either 
deterministic  or  stochastic  origin  which,  mathematically  speaking,  means 
that  the  variables  describing  the  system  undergo  transformations. 
Assume  now  that  in  distinction  to  the  above  we  have  a  process  in  which 
we  have  a  choice  of  the  transformations  which  may  be  applied  to  the 
system  at  any  time.  A  process  of  this  type  we  call  a  decision  process, 
with  a  decision  equivalent  to  a  transformation.  If  we  have  to  make  a 
single  decision,  we  call  the  process  a  single-stage  process;  if  a  sequence 
of  decisions,  than  we  use  the  term  multi-stage  decision  process. 

The  distinction,  of  course,  is  not  hard  and  fast.  The  choice  of  a  point 
in  three-dimensional  space  may  be  considered  to  be  a  single-stage  process 
wherein  we  choose  (x,  y,  z),  or  a  multi-stage  process  where  we  choose 
first  .r,  then  y,  and  then  z. 

There  are  a  number  of  multi-stage  processes  which  are  quite  familiar 
to  us.  Perhaps  the  most  common  are  those  occurring  in  card  games,  such 

vii 


PREFACE 


as  the  bidding  system  in  contract  bridge,  or  the  raise-counter-raise 
system  of  poker  with  its  delicate  overtones  of  bluffing.  On  a  larger  scale, 
we  continually  in  our  economic  life  engage  in  multi-stage  decision 
processes  in  connection  with  investment  programs  and  insurance  policies. 
In  the  scientific  world,  control  processes  and  the  design  of  experiments 
furnish  other  examples. 

The  point  we  wish  to  make  is  that  in  modern  life,  in  economic,  in¬ 
dustrial,  scientific  and  even  political  spheres,  wc  are  continually  sur¬ 
rounded  by  multi-stage  decision  processes.  Some  of  these  we  treat  on 
the  basis  of  experience,  some  we  resolve  by  rule-of-thumb,  and  some  are 
too  complex  for  anything  but  an  educated  guess  and  a  prayer. 

Unfortunately  for  the  peace  of  mind  of  the  economist,  industrialist, 
and  engineer,  the  problems  that  have  arisen  in  recent  years  in  the  eco¬ 
nomic,  industrial,  and  engineering  fields  are  too  vast  in  portent  and 
extent  to  be  treated  in  the  haphazard  fashion  that  was  permissible  in  a 
more  leisurely  bygone  era.  The  price  of  tremendous  expansion  has  become 
extreme  precision. 

These  problems,  although  arising  in  a  multitude  of  diverse  fields,  share 
a  common  property — they  are  exceedingly  difficult.  Whether  they  arise 
in  the  study  of  optimal  inventory  or  stock  control,  or  in  an  input-output 
analysis  of  a  complex  of  interdependent  industries,  in  the  scheduling  of 
patients  through  a  medical  clinic  or  the  servicing  of  aircraft  at  an 
airfield,  the  study  of  logistics  or  investment  policies,  in  the  control  of 
servo  mechanisms,  or  in  sequential  testing,  they  possess  certain  common 
thorny  features  which  stretch  the  confines  of  conventional  mathematical 
theory. 

It  follows  that  new  methods  must  be  devised  to  meet  the  challenge  of 
these  new  problems,  and  to  a  mathematician  nothing  could  be  more 
pleasant.  It  is  a  characteristic  of  this  species  that  its  members  are 
never  so  happy  as  when  confronted  by  problems  which  cannot  be 
solved — immediately.  Although  the  day  is  long  past  when  anyone 
seriously  worried  about  the  well  of  mathematical  invention  running  dry, 
it  is  still  nonetheless  a  source  oi  great  delight  to  see  a  vast  untamed 
jungle  of  difficult  and  significant  problems,  such  as  those  furnished  by 
the  theory  of  multi-stage  decision  ’-ocesscs,  suddenly  appear  before  us. 

Having  eonjured  up  this  preserve  of  problems,  let  us  see  what  compass 
wc  shall  use  to  chart  our  path  im-'  ’his  new  domain.  The  conventional 
approach  we  may  label  ‘  enumerative.”  Each  decision  may  be  thought 
of  as  a  choice  of  a  certain  numb  r  of  variables  which  determine  the 
transformation  to  be  employed;  cm  >  sequence  of  choices,  or  policy  as  we 
shall  say,  is  a  choice  of  a  larger  set  of  variables.  By  lumping  all  these 
choices  together,  we  “reduce”  the  problem  to  a  classical  problem  of 
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determining  the  maximum  of  a  given  function.  Tfiis  function,  which 
arises  in  the  course  of  measuring  some  quantitative  property  of  the 
system,  serves  t he  purpose  of  evaluating  policies. 

At  this  point  it  is  very  easy  for  the  mathematician  to  lose  interest 
and  let  the  computing  machine  take  over.  To  maximize  a  reasonably 
well-behaved  function  seems  a  simple  enough  task;  we  take  partial 
derivatives  and  solve  the  resulting  system  of  equations  for  the  maxi¬ 
mizing  point. 

There  are,  however,  some  details  to  consider.  In  the  first  place,  the 
effective  analytic  solution  of  a  large  number  of  even  simple  equations 
as,  for  example,  linear  equations,  is  a  difficult  affair.  Lowering  our  sights, 
even  a  computational  solution  usually  has  a  number  f  difficulties  of 
both  gross  and  subtle  nature.  Consequently,  the  determination  of  this 
maximum  is  quite  definitely  not  routine  when  the  number  of  variables 
is  large. 

All  this  may  be  subsumed  under  the  heading  "the  curse  of  dimensional¬ 
ity.”  Since  this  is  a  curse  which  has  hung  over  the  head  of  the  physicist 
and  astronomer  for  many  a  year,  there  is  no  need  to  feel  discouraged 
about  the  possibility  of  obtaining  significant  results  despite  it. 

However,  this  is  not  the  sole  difficulty.  A  further  characteristic  of 
these  problems,  as  we  shall  see  in  tin-  ensuing  pages,  is  that  calculus  is 
not  always  sufficient  for  our  purj  oses,  as  a  consequence  of  the  perverse 
fact  that  quite  frequently  the  solution  is  a  boundary  point  of  the  region 
of  variation.  This  is  a  manifestation  of  the  fact  that  many  decision 
processes  embody  certain  all-or-nothing  characteristics.  Very  often  then, 
we  are  reduced  to  determining  the  maximum  of  a  function  by  a  combi¬ 
nation  of  analytic  and  "hunt  and  search”  techniques. 

Whatever  the  difficulties  arising  in  the  deterministic  case  which  we 
have  tacitly  been  assuming  above,  these  difficulties  are  compounded  in 
the  stochastic  case,  where  the  outcome  of  a  decision,  or  tranformation, 
is  a  random  variable.  Here  any  crude  lumping  or  enumerative  technique 
is  surely  doomed  by  the  extraordinary  manner  in  which  the  number  of 
combinations  of  cases  increases  with  the  number  of  cases. 

Assume,  however,  that  we  have  circumvented  all  these  difficulties  and 
have  attained  a  certain  computational  nir\  ma.  Withal  the  mathe¬ 
matician  has  not  discharged  his  responsibilities.  The  problem  is  not  to 
be  considered  solved  in  the  mathematical  sense  until  the  structure  of  the 
optimal  policy  is  understood. 

Interestingly  enough,  this  concept  of  the  mathematical  solution  is 
identical  with  the  prencr  concept  of  a  solution  in  the  physical,  economic, 
or  engineering  sense.  '  order  to  make  this  point  clear  and  it  is  a  most 
important  point  sinc<  .  i  many  ways  it  is  the  raison  d’etre  for  inathe- 
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matical  physics,  mathematical  economics,  .and  many  similar  hybrid 
fields — let  us  make  a  brief  excursion  into  the  philosophy  of  mathematical 
models. 

The  goal  of  the  scientist  is  to  comprehend  the  phenomena  of  the 
universe  he  observes  around  him.  To  prove  that  he  understands,  he  must 
be  able  to  predict,  and  to  predict,  one  requires  quantitative  measure¬ 
ments.  A  qualitative  prediction  such  as  the  occurrence  of  an  eclipse 
or  an  earthquake  or  a  depression  sometime  in  the  near  future  does  not 
have  the  same  satisfying  features  as  a  similar  prediction  associated  with 
a  date  and  time,  and  perhaps  backed  up  by  the  offer  of  a  side  wager. 

To  predict  quantitatively  one  must  have  a  mechanism  for  producing 
numbers,  and  this  necessarily  entails  a  mathematical  model.  It  seems 
reasonable  to  suppose  that  the  more  realistic  this  mathematical  model, 
the  more  accurate  the  prediction. 

There  is,  however,  a  point  of  diminishing  returns.  The  actual  world  is 
extremely  complicated,  and  as  a  matter  of  fact  the  more  that  one  studies 
it  the  more  one  is  tilled  with  wonder  that  we  have  even  "order  of  magni¬ 
tude"  explanations  of  the  complicated  phenomena  that  occur,  much 
less  fairly  consistent  "laws  c  f  nature.”  If  we  attempt  to  include  too  many 
features  of  reality  in  our  mathematical  model,  we  find  ourselves  engulfed 
by  complicated  equations  containing  unknown  parameters  and  unknown 
functions.  The  determination  of  these  functions  leads  to  even  more 
complicated  capiat  ions  with  even  more  unknown  parameters  and  functions, 
and  so  on.  Truly  a  tale  that  knows  no  end. 

If,  on  the  other  hand,  made  timid  by  these  prospects,  we  construct 
our  model  in  too  simple  a  fashion,  we  soon  find  that  it  does  not  predict 
to  suit  our  tastes. 

It  follows  that  the  Sc  ientist,  like  the  l’ilgrim,  must  wend  a  straight 
and  narrow  path  between  the  Pitfalls  of  Oversimplification  and  the 
Morass  of  Overcomplication. 

Knowing  that  no  mathematical  model  can  yield  a  complete  description 
of  reality,  we  must  resign  ourselves  to  the  task  of  using  a  succession  of 
models  of  greater  and  greater  complexity  in  our  efforts  to  understand. 
If  we  observe  similar  structural  feature  s  possessed  by  the  solutions  of  a 
sequence  of  models,  then  we  may  feel  that  wo  have  an  approximation 
to  what  is  called  a  “law  of  nature." 

It  follows  that  from  a  teleological  point  of  \  iew  the  particular  numerical 
solution  of  any  particular  set  of  equations  is  of  far  less  importance  than 
the  understanding  of  the  naturt  of  the  solution,  which  is  to  say  the 
influence  of  the  phvsical  properties  of  the  system  upon  the  form  of  the 
solution. 

Now  let  us  see  how  this  idea  guides  us  to  a  new  formulation  of  these 
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decision  processes,  and  indeed  of  some  other  processes  of  analysis  which 
are  not  usually  conceived  of  as  decision  processes.  In  the  conventional 
formulation,  we  consider  the  entire  multi-stage  decision  process  as 
essentially  one  stage,  at  the  exjiense  of  vastly  increasing  the  dimension 
of  the  problem.  Thus,  if  we  have  an  A’-stage  process  where  M  decisions 
are  to  be  made  at  each  stage,  the  classical  approach  envisages  an  MN- 
dimensional  sii  gle-stagc  process.  The  fundamental  problem  that  con¬ 
fronts  us  is:  How  can  we  avoid  this  multiplication  of  dimension  which 
stifles  analysis  and  greatly  impedes  computation  ? 

In  order  to  answer  this,  let  us  turn  to  the  previously  enunciated 
principle  that  it  is  the  structure  of  the  policy  which  is  essential.  What 
does  this  mean  precisely?  It  means  that  we  wish  to  know  the  charac¬ 
teristics  of  the  system  which  determine  the  decision  to  be  made  at  any 
particular  stage  of  the  process,  l’ut  another  way,  in  place  of  determining 
the  optimal  sequence  of  decisions  from  some  fixed  state  of  the  system, 
we  wish  to  determine  the  optimal  decision  to  be  made  at  any  state  of 
the  system.  Only  if  we  know  the  latter,  do  we  understand  the  intrinsic 
structure  of  the  solution. 

The  mathematical  advantage  of  this  formulation  lies  first  of  all  in 
the  fact  that  it  reduces  the  dimension  of  the  process  to  its  proper  level, 
namely  the  dimension  of  the  decision  which  confronts  one  at  any  particular 
stag*'.  This  makes  the  problem  analytically  more  tractable  and  compu¬ 
tationally  vastly  simpler.  Secondly,  as  we  shall  see,  it  furnishes  us  with 
a  type  of  approximation  which  has  a  unique  mathematical  property, 
that  of  monotonicitv  of  convergence,  and  is  well  suited  to  applications, 
namely,  "approximation  in  policy  space”. 

The  conceptual  advantage  of  thinking  in  terms  of  policies  is  very 
great.  It  affords  us  a  means  of  thinking  about  and  treating  problems 
which  cannot  be  profitably  discussed  in  any  other  terms.  If  we  were  to 
hazard  a  guess  as  to  w  hich  direction  of  research  would  achieve  the  greatest 
success  in  the  future  of  multi-dimensional  processes,  we  would  un¬ 
hesitatingly  choose  this  one. 

The  theme  of  this  volume  will  be  the  application  of  this  concept  of 
a  solution  to  a  number  of  processes  of  varied  type  which  we  shall 
discuss  below. 

The  title  is  also  derived  in  this  way.  The  problems  we  treat  are  pro¬ 
gramming  problems,  to  use  a  terminology  now  popular.  The  adjective 
‘‘dynamic,”  however,  indicates  that  we  are  interested  in  processes  in 
which  time  plays  a  significant  role,  and  in  which  the  order  of  operations 
may  be  crucial.  However,  an  essential  feature  of  our  approach  will  be 
the  reinterpretation  of  manv  static  processes  as  dynamic  processes  in 
which  time  can  be  artificially  introduced. 
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Let  us  now  turn  to  a  discussion  of  the  contents. 

In  the  first  chapter  we  consider  a  multi-stage  allocation  process  of 
deterministic  type  which  is  a  prototype  of  a  general  class  of  problems 
encountered  in  various  phases  of  logistics,  in  multi-stage  investment 
processes,  in  the  study  of  optimal  purchasing  policies,  and  in  the  treat¬ 
ment  of  many  other  economic  processes.  From  the  mathematical  point 
of  view,  the  problem  is  related  to  multi-dimensional  maximization 
problems,  and  ultimately,  as  will  be  indicated  below,  to  the  calculus 
of  variations. 

We  shall  first  discuss  the  process  in  the  conventional  manner  and 
observe  the  dimensional  difficulties  which  arise  from  the  discussion  of 
even  very  simple  processes.  Then  we  shall  introduce  the  fundamental 
technique  of  the  theory,  the  conversion  of  the  original  maximization 
problem  ;nto  the  problem  of  determining  the  solution  of  a  functional 
equation. 

The  functional  equations  which  arise  in  this  way  are  of  a  novel  type, 
completely  different  from  any  of  the  functional  equations  encountered 
in  classical  analysis.  The  particular  one  we  shall  employ  for  purposes 
of  discussion  in  this  chapter  is 

(!)  /(*)  =  Max  [g(y)  +  h  (x  —  y)  +/(ay  +  b  (x  —  y))] . 

o  <  u  < I 

where  g  and  h  are  known  functions  and  a  and  b  are  known  constants, 
satisfying  the  condition  0  <  a,  b  <  1. 

After  establishing  an  existence  and  uniqueness  theorem,  we  shall 
derive  some  simple  properties  of  the  optimal  policy  which  can  be  deduced 
from  simple  functional  properties  of  g  and  It.  In  particular,  we  shall 
present  the  explicit  solution  of  some  equations  where  g  and  h  have 
various  special  forms. 

The  advantage  of  obtaining  these  solutions  lies  in  the  fact  that  they 
can  be  utilized  to  obtain  approximations  to  the  solutions  of  more  compli¬ 
cated  equations,  and,  what  is  more  important,  approximations  to  the 
associated  optimal  policies.  The  subject  of  approximation  leads  us  to 
the  concept  of  approximation  in  policy  space,  of  importance  and  utility 
in  both  theoretical  and  practical  discussion,  and  to  the  discussion  of 
the  cpiestion  of  the  stability  of  /  under  changes  in  g  and  h. 

In  the  second  chapter  we  consider  a  multi-stage  decision  process  of 
stochastic  type  in  the  guise  of  a  gold-mining  venture  with  a  delicate 
gold-mining  maciiine.  Here  we  encounter  the  equation 


(2) 

xii 


f(x,y)  Max 


[A :  pi  x  +  /((l  hKy)]' 

1.1* :  Pi  [r*y  +  /(*•  (*  —  rt)  y)\ 


PREFACE 


In  addition  to  pursuing  an  investigation  similar  to  that  given  in 
Chapter  I,  we  actually  obtain  a  solution  to  this  equation,  and  some  of 
its  generalizations.  The  solution  has  a  particularly  simple  and  intuitive 
form,  and  introduces  the  useful  idea  of  "decision  regions.” 

We  show,  however,  that  some  other  generalizations  do  not  have  as 
simple  a  structure,  and  indeed,  pose  as  yet  unresolved  problems.  An 
attempt  to  obtain  approximate  solutions  to  these  problems  for  a  parti¬ 
cular  region  of  parameter  space  will  lead  us  to  the  continuous  versions 
treated  in  Chapter  VIII. 

Chapter  III  is  devoted  to  a  synthesis  of  these  processes  which  seem  so 
different  at  first  glance.  In  this  chapter  we  analyze  the  common  features 
of  the  two  processes  treated  in  the  preceding  chapters,  and  then  proceed 
to  formulate  general  versions  of  these  processes.  In  this  way  we  obtain 
the  functional  equation 

(3)  /  (p)  =  Max  [g  (/■,  q)  +  h  (p,  q)  f  (T  (p,  ?))] , 

which  includes  both  of  the  preceding,  and  a  number  of  equations  of 
still  more  general  type. 

Also  in  this  chapter  we  explicitly  state  the  "principle  of  optimality” 
whose  mathematical  transliteration  in  the  case  of  any  specific  process 
yields  the  functional  equation  governing  the  process.  The  concept  of 
"approximation  in  policy  space”  is  also  discussed  in  more  detail. 

In  the  following  chapter,  Chapter  IV,  a  number  of  existence  and 
uniqueness  theorems  are  established  for  several  frequently  occurring 
classes  of  equations  having  the  above  form.  Our  proofs  hinge  upon  a 
simple  lemma  which  enables  us  to  compare  two  solutions  of  the  equation 
in  (3).  Although  these  equations  arc  highly  non-linear,  in  many  ways 
they  constitute  a  natural  generalization  of  linear  equations.  For  this 
reason  alone,  aside  from  their  applications,  they  merit  study. 

In  Chapter  V,  we  discuss  a  functional  equation  derived  from  a  problem 
of  much  economic  interest  at  the  current  time,  the  “optimal  inventory” 
problem.  Here  we  show  that  the  various  techniques  we  have  discussed 
in  the  preceding  chapters  yield  the  solutions  of  some  interesting  particular 
cases.  In  particular,  we  show  that  the  method  of  successive  approxima¬ 
tions  is  an  efficient  analytic  tool  for  the  discovery  of  properties  of  the 
solution  and  the  policy,  rather  than  merely  a  humdrum  means  of  obtaining 
existence  and  uniqueness  theorems.  There  arc  many  different  versions 
of  the  optimal  inventory  problem  and  we  restrict  ourselves  to  a  discussion 
of  the  mathematical  model  first  proposed  by  Arrow,  Harris,.and  Marschak, 
and  treated  also  by  Dvoretzkv,  Kiefer,  and  Wolfowitz. 
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A  particular  equation  of  the  type  we  shall  consider  is 

(4)  /(*)  -  Min  [g  (y  -  *)  +  «  {  f °°  p  (*  —  y)  (s)  +/(0)  f  dG  (s) 

y  >  x  Jv  Jv 

+  fgVf(y-s)dG(s )}] 

We  then  turn  to  a  study  of  what  we  call  "bottleneck  processes."  These 
we  define  as  processes  where  a  number  of  interdependent  activities  are 
to  be  combined  for  one  common  purpose,  with  the  level  of  this  principal 
activity  dependent  upon  the  minimum  level  of  activity  of  the  components. 

Two  chapters  are  devoted  to  these  problems,  the  first,  Chapter  VI, 
of  theoretical  nature,  and  the  second.  Chapter  VII,  given  over  to  the 
actual  details  of  the  complete  solution  of  one  particular  process. 

The  problems  that  we  encounter  are  particular  cases  of  the  general 
problem,  apparently  not  treated  before  in  any  mathematical  detail,  of 
determining  the  maximum  over  z  of  the  inner  product  (x  (T),  a),  where 
x  and  z  arc  connected  by  means  of  the  vector-matrix  equation 

(5)  dxjdt  =  A  x  4  Bz,  x  (0)  =  c, 

and  where  there  is  a  constraint  of  the  form  Cz  4  Bx<,f.  Here  x,  z,  c 
and  /  are  vectors  and  A,B,C  and  D  are  matrices.  The  linearity  of  the 
operators  and  functionals  constitutes  the  principal  difficulty. 

We  might  observe  parenthetically  that  it  is  often  thought  that  line¬ 
arizing  a  problem  facilitates  its  solution.  On  occasion,  however,  partic¬ 
ularly  in  variational  problems,  it  frequently  complicates  affairs  to  an 
enormous  degree,  since  this  linearization  renders  classical  variational 
techniques  largely  inapplicable.  In  return,  however,  the  computational 
solution  of  particular  cases  may  often  be  obtained  by  routine  procedures. 

In  Chapter  VIII,  we  return  to  the  gold-mining  process,  and  consider 
a  continuous  version.  There  arc  many  problems,  some  of  a  quite  recondite 
nature,  associated  with  the  formulation  of  continuous  stochastic  decision 
processes.  In  the  processes  at  hand,  w'e  are  fortunate  in  being  able  to 
sidestep  these  difficulties.  In  the  continuous  version,  combining  the 
classical  variational  approach  with  the  techniques  employed  in  previous 
chapters,  we  are  able  to  solve  completely  the  continuous  versions  of  a 
number  of  problems  that  were  resolutely  intractable  in  the  discrete  case. 

We  now  turn  to  the  calculus  of  variations  in  Chapter  IX,  and  show 
that  various  characteristic  problems  may  be  viewed  as  dynamic 
programming  processes  of  continuous  and  deterministic  tvpe. 

In  geometric  terms,  the  classical  formulation  is  equivalent  to  con¬ 
sidering  an  extremal  curve  as  a  locus  of  points,  while  the  dynamic 
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programming  formulation  conceives  of  the  extremal  as  the  envelope 
of  tangents. 

Taking  this  latter  point  of  view,  we  are  able  to  obtain  a  new  formu¬ 
lation  of  some  parts  of  the  classical  theory.  In  particular,  we  show  how 
to  obtain  partial  differential  equations,  in  terms  of  suitably  introduced 
state  variables,  for  the  principal  eigen-value  of  the  differential  equation 

(6)  u"  -f  )*  (p  (t)  u  0,  u  (0)  =  u  (1)  =  0. 

Furthermore,  we  provide  a  new  computational  approach  to  variational 
problems  with  constraints. 

In  Chapter  X,  we  consider  dynamic  programming  processes  involving 
two  decision-makers,  essentially  opposed  to  each  other  in  their  interests. 
This  leads  to  the  discussion  of  multi-stage  games,  and,  in  particular,  to 
the  very  interesting  class  of  games  called  “games  of  survival.”  With  the 
aid  of  some  heuristic  reasoning,  we  are  able  to  obtain  a  new  rationale 
for  non-zero  sum  games,  as  a  by-product. 

The  functional  equations  encountered  in  this  domain  have  the  general 
form 

(7)  /  (p,  p')  =--=  Max  Min  [J  J  [g  (/>,  />',  q,  q')  + 

h  (p,  p',  q,  q')f['l\  (p.  p',  q,  q'),  T*  (/>,  />',  q,  q']  ]  dC  (q)  dG'  (?')]  . 

They  may  be  treated  by  means  of  the  same  general  methods  used  in 
Chapter  IV  to  discuss  the  equation  in  (3)  above. 

In  the  final  chapter,  we  consider  a  class  of  continuous  decision  processes 
which  lead  to  non-linear  differential  equations  of  the  form 

Jx  -v 

(8)  =  Max  [  2'  ati  (t;  q)  x,  +  In  (*)],  x,  (0)  -  c,,  i  -  1, 2 . N, 

dl  q  j  » 

togetlier  with  the  corresponding  equations  derived  from  the  discrete 
process. 

These  equations  possess  amusing  connections  with  some  classical 
non-linear  equations,  as  we  indicate. 

In  addition  to  a  number  of  exercises  inserted  for  pedagogical  purposes, 
we  have  included  a  cross-section  of  problems  designed  to  indicate  the 
scope  of  the  application  of  the  methods  of  dynamic  programming. 

There  may  be  some  who  will  frown  upon  some  of  the  less  than  profound 
subjects  which  arc  occasionally  discussed  in  the  exercises,  and  used  to 
illustrate  \  arious  types  of  processes.  We  are  prepared  to  defend  ourselves 
against  the  charges  of  lese  majeste  in  a  number  of  ways,  but  we  prefer 
the  two  following.  In  the  first  place,  interesting  mathematics  is  where 
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you  find  it,  sometimes  in  a  puzzle  concerning  the  bridges  of  Koenigsberg, 
sometimes  in  a  problem  concerning  the  coloring  of  maps,  or  perhaps  the 
seating  of  schoolgirls,  perhaps  in  the  determining  of  winning  play  in 
games  of  chance,  perhaps  in  an  unexpected  regularity  in  the  distribution 
of  primes.  In  the  second  place,  all  thought  is  abstract,  and  mathematical 
thought  especially  so.  Consequently,  whether  we  introduce  our  mathe¬ 
matical  entities  under  the  respectable  sobriquets  of  A  and  B,  or  by  the 
more  charming  Alice  and  Betty,  or  whether  we  speak  of  stochastic 
processes,  or  the  art  of  gaming,  it  is  the  mathematical  analysis  that 
counts.  Any  mathematical  study,  such  as  this,  must  be  judged,  ultimately 
upon  its  intrinsic  content,  and  not  by  the  density  of  high-sounding 
pseudo-abstractions  with  which  a  text  may  so  easily  be  salted. 

This  completes  our  synopsis  of  the  volume.  Since  the  processes  we 
consider,  the  functional  equations  which  arise,  and  the  techniques  we 
employ  are  in  the  main  novel  and  therefore  unfamiliar,  we  have  restricted 
ourselves  to  a  moderate  mathematical  level  in  order  to  emphasize  the 
principles  involved,  untrammeled  by  purely  analytic  details.  Consistent 
with  this  purpose  wc  have  not  penetrated  too  deeply  into  any  one  domain 
of  application  of  the  theory  from  either  the  mathematical,  economic,  or 
physical  side. 

In  every  chapter  we  have  attempted  to  avoid  any  discussion  of  deeper 
results  requiring  either  more  advanced  training  on  the  part  of  the  reader 
or  more  high-powered  analytic  argumentation.  Occasionally,  as  in 
Chapter  VI  and  Chapter  IX,  wc  have  not  hesitated  to  waive  rigorous 
discussion  and  proceed  in  a  frankly  heuristic  manner. 

In  a  contemplated  second  volume  on  a  higher  mathematical  level,  we 
propose  to  rectify  some  of  these  omissions,  and  present  a  number  of 
topics  of  a  more  advanced  character  which  we  have  either  not  mentioned 
at  all  here,  mentioned  in  passing,  or  sketched  in  bold  outline.  It  will 
be  apparent  from  the  text  how  much  remains  to  be  done. 

In  this  connection  it  is  worth  indicating  a  huge,  important,  and 
relatively  undeveloped  area  into  which  this  entire  volume  represents 
merely  a  small  excursion.  This  is  the  general  study  of  the  computational 
solution  of  multi-dimensional  variational  problems.  Specifically  we  may 
pose  the  general  problem  as  follows:  (oven  a  process  with  an  associated 
variational  problem,  how  do  wc  utilize  the  special  features  of  the  process 
to  construct  a  computational  algorithm  for  solving  the  variational 
problem  ? 

Dynamic  programming  is  designed  to  treat  multi-stage  processes 
possessing  certain  invariant  aspects.  The  theory  of  linear  programming 
is  designed  to  treat  processes  possessing  certain  features  of  linearity,  and 
the  elegant  "simplex  method"  of  G.  Dantzig  to  a  large  extent  solves 
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the  problem  for  these  processes.  For  certain  classes  of  scheduling  pro¬ 
cesses,  there  are  a  variety  of  iterative  and  relaxation  methods.  In  particu¬ 
lar,  let  us  note  the  methods  of  Hitchcock,  Koopmans,  and  Flood  for 
the  Hitchcock-Koopmans  transportation  problem,  and  the  "flooding 
technique”  of  A.  Boldyreff  for  railway  nets.  Furthermore,  there  is  the 
recent  theory  of  non-linear  programming  of  H.  Kuhn  and  A.  W.  Tucker 
and  E.  Beale.  The  study  of  computational  techniques  is,  however,  in 
its  infancy. 

We  have  taken  as  our  audience  al'  those  interested  in  variational  prob¬ 
lems,  including  mathematicians,  statisticians,  economists,  engineers, 
operations  analysts,  systems  engineers,  and  so  forth.  Since  the  interests 
of  various  members  of  this  audience  overlap  to  only  a  slight  degree,  some 
parts  of  the  book  will  be  of  greater  interest  to  one  group  than  another. 

For  first  readings  we  suggest  the  following  programs: 


Mathematician : 
Economist : 
Statistician : 
Engineer  : 

Operations  Analyst : 


Chapters  I,  II,  III,  IV,  IX,  X 
Chapters  I,  II,  III,  V,  IX 
Chapters  I,  II,  III,  IX,  X,  XI 
Chapters  I,  II,  III,  IX 
Chapters  I,  II,  III,  V,  IX,  X 


Finally,  before  ending  this  prologue,  it  is  a  pleasure  to  acknowledge 
my  indebtedness  to  a  number  of  sources:  First,  to  the  von  Neumann 
theoij  of  games  as  developed  by  J.  von  Neumann,  O.  Morgenstem,  and 
others,  a  theory  which  shows  how  to  treat  by  mathematical  analysis 
vast  classes  of  problems  formerly  far  out  of  the  reach  of  the  mathe¬ 
matician — and  relegated,  therefore,  to  the  limbo  of  imponderables — and, 
simultaneously,  to  the  Wald  theory  of  sequential  analysis,  as  developed 
by  A.  Wald,  D.  Blackwell,  A,  Girshick,  J.  Wolfowitz,  and  others,  a 
theory  which  shows  the  vast  economy  of  effort  that  may  be  effected  by 
the  proper  consideration  of  multi-stage  testing  processes;  second,  to  a 
number  of  colleagues  and  friends  who  have  discussed  various  aspects  of 
the  theory  with  me  and  contributed  to  its  clarification  and  growth. 

Many  of  the  results  in  this  volume  were  obtained  in  collaboration 
with  fellow  mathematicians.  The  formulation  of  games  of  survival  was 
obtained  in  conjunction  with  J.  P.  LaSalle;  the  results  on  the  optimal 
inventory  equation  were  obtained  together  with  I.  Glicksberg  and  O. 
Gross;  the  results  on  the  continuous  gold-mining  process  in  Chapter  VIII 
and  the  results  in  Chapter  VII  com.  rmng  specific  bottleneck  processes 
were  obtained  together  with  S.  Lehman ;  a  number  of  results  obtained 
with  H.  Osborn  on  the  connection  between  characteristics  and  Euler 
equations,  and  on  the  convergence  of  discrete  gold-mining  processes  to 
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the  continuous  versions  will  not  appear  in  this  volume.  Nor  shall  wc 
include  a  study  of  the  actual  computational  solution  of  many  of  the 
processes  discussed  below,  in  which  we  have  been  engaging  in  conjunction 
with  S.  Dreyfus. 

I  should  particularly  like  to  thank  I.  Glicksberg,  O.  Gross  and  A. 
Boldyieff  who  read  the  final  manuscript  through  with  great  care  and 
made  a  number  of  useful  suggestions  and  corrections,  and  S.  Karlin 
and  H.  N.  Shapiro  who  have  done  much  valuable  work  in  this  field  and 
from  whose  many  stimulating  conversations  I  have  greatly  benefited. 

Finally,  I  should  like  to  record  a  special  debt  of  gratitude  to  O.  Helmer 
and  E.  W.  Paxson  who  early  appreciated  the  importance  of  multi-stage 
processes  and  who,  in  addition  to  furnishing  a  number  of  fascinating 
problems  arising  naturally  in  various  important  applications,  constantly 
encouraged  me  in  my  researches. 

Santa  Monica,  California  Richard  Bellman 
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CHAPTER  I 


A  Multi-Stage  Allocation  Process 

§  1.  Introduction 

In  this  chapter  we  wish  to  introduce  the  reader  to  a  representative 
class  of  problems  lying  within  the  domain  of  dynamic  programming  and 
to  the  basic  approach  we  shall  employ  throughout  the  subsequent  pages. 

To  begin  the  discussion  we  shall  consider  a  multi-stage  allocation 
process  of  rather  simple  structure  which  possesses  many  of  the  elements 
common  to  a  variety  of  processes  that  occur  in  mathematical  analysis, 
in  such  fields  as  ordinary  calculus  and  the  calculus  of  variations,  and 
in  such  applied  fields  as  mathematical  economics,  and  in  the  study  of 
the  control  of  engineering  systems. 

We  shall  first  formulate  the  problem  in  classical  terms  in  order  to 
illustrate  .some  of  the  difficulties  of  this  straightforward  approach.  To 
circumvent  these  difficulties,  we  shall  then  introduce  the  fundamental 
approach  used  throughout  the  remainder  of  the  book,  an  approach  based 
upon  the  idea  of  imbedding  any  particular  problem  within  a  family  of 
similar  problems.  This  will  permit  us  to  replace  the  original  multi¬ 
dimensional  maximization  problem  by  the  problem  of  solving  a  system 
of  recurrence  relations  involving  functions  of  much  smaller  dimension. 

As  an  approximation  to  the  solution  of  this  system  of  functional 
equations  we  are  lead  to  a  single  functional  equation,  the  equation 

(J)  /(*)  =  Max  [g(y)  -f  h(x  —  y)  +  f  {ay  -f  b  (x  —  y))] . 

0  <  y  <  i 

This  equation  will  be  discussed  in  some  detail  as  far  as  existence  and 
uniqueness  of  the  solution,  properties  of  the  solution,  and  particular 
solutions  are  concerned. 

Turning  to  processes  of  more  complicated  type,  encompassing  a  greater 
range  of  applications,  we  shall  first  discuss  time-dependent  processes 
and  then  derive  some  multi-dimensional  analogues  of  (1),  arising  from 
multi-stage  processes  requiring  a  number  of  decisions  at  each  stage. 
These  multi  dimensional  equations  give  rise  to  some  difficult,  and  as 
yet  unresolved,  questions  in  computational  analysis. 

In  the  concluding  portion  of  the  chapter  we  consider  some  stochastic 
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versions  of  these  allocation  processes.  As  we  shall  see,  the  same  analytic 
methods  suffice  ^or  the  treatment  of  both  stochastic  and  deterministic 
processes. 

§  54.  A  multi*stage  allocation  process 

Let  us  now  proceed  to  describe  a  multi-stage  allocation  process  of 
simple  but  important  type. 

Assume  that  we  have  a  quantity  x  which  we  divide  into  two  non¬ 
negative  parts,  y  and  x  —  y,  obtaining  from  the  first  quantity  y  a 
return  of  g  (y)  and  from  the  second  a  return  of  h  (x  —  y).1  If  we  wish 
to  perform  this  division  in  such  a  way  as  to  maximize  the  total  return 
we  are  led  to  the  analytic  problem  of  determining  the  maximum  of 
the  function 

(1)  (x.y)=g(y)+h(x—y) 

for  all  y  in  the  interval  [0,  x].  Let  us  assume  that  g  and  h  are  continuous 
functions  of  x  for  all  finite  x  2>  0  so  that  this  maximum  will  always  exist. 

Consider  now  a  two-stage  process.  Suppose  that  as  a  price  for  obtaining 
the  return  g  (y),  the  original  quantity  y  is  reduced  to  ay,  where  a  is  a 
constant  between  0  and  1,  0  a  <  1,  and  similarly  x — y  is  reduced 
to  b  (x  —  y),  0<;fc  <  1,  as  the  cost  of  obtaining  h  (x — y).  With  the 
remaining  total,  ay  -f-  b  (x  —  y),  the  process  is  now  repeated.  We  set 

(2)  ay  +  b  (x  —  y)  =*  x,  =  v,  -f  (x,  —  y,) , 

for  0<;  y,  <;  x,,  and  obtain  as  a  result  of  this  new  allocation  the  return 
g  (y,)  +  A  (x, --y,)  at  the  second  stage.  The  total  return  for  the  two- 
stage  process  is  then 

(3)  Ri  (*■  y.  y.)  =  g  (y)  +  A  (*  —  y)  +  g  (y,)  +  h  (x,  —  y.) 

and  the  maximum  return  is  obtained  by  maximizing  this  function  of 
y  and  y,  over  the  two-dimensional  region  determined  by  the  inequalities 

(4)  a.  0  ^  y  x 

b.  O^y^  x, 

Let  us  turn  our  attention  now  to  the  TV-stage  process  where  we  repeat 

1  The  units  of  the  return  .ire,  in  this  case,  different  from  the  units  of  x.  Thus, 
for  example,  x  may  be  in  dollars,  and  g  (y)  may  be  man  hours  of  service  from  machines 
purchased  with  the  y  dollars.  In  other  cases,  occurring  in  multi-stage  investment 
problems,  or  multi-stage  production  problems,  this  will  not  be  so,  in  that  the  units 
of  the  return  will  be  the  same  as  that  of  the  resources,  or  a  mixture  of  both  situations 
will  occur,  We  are  considering  the  simplest  case  here 
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the  above  operation  of  allocation  N  times  in  succession.  The  total  return 
fr6m  the  A'-stagc  process  will  then  be 

(•'»)  Rx(x,y.y, . yx-\)  =  g(y)  +  h(x  —  y)  +  g  (y.) 

-f  h(xi  —  y,)  +  •••  +  g(yx-i)  4-  h(xy-i  —  ys-\), 

where  the  quantities  available  for  subsequent  allocation  at  the  end  of 
the  first,  second, _ _  (N  —  l)st  stage  are  given  by 

(6)  x,  =  ay  -f  b(x  —  y),  0  ^  y  ^  x, 

Xt  =  ay,  +  b  ( x ,  —  yt),  0<.y,<.x,, 


xN  _  i  =  ays  -  2  +  b  (xs  -2  —  yx  -  2) , 

0  <T  yx  -  2  xx  -  2 ,  0  y s  -  1  <1  Xs  -  1 

The  maximum  return  will  be  obtained  by  maxrnizing  the  function  Rs 
over  the  A’-dimensional  region  in  the  space  of  the  variables  y,  y,,..., 
Vs-  1,  described  by  the  relations  in  (fi). 

§  :t.  Discussion 

In  setting  out  to  solve  this  problem,  the  temptation  is,  quite  naturally, 
to  use  calculus.  If  the  absolute  maximum  occurs  inside  the  region,  which 
is  to  say  if  all  the  y(  satisfy  the  strict  inequalities  0  <  y,  <  Xt,  and  if 
the  functions  g  (x)  and  h  (x)  possess  derivatives,  we  obtain  for  the 
determination  of  the  maximizing  y<  the  system  of  equations, 

(i)  g'  (yx- 1)  h'(xx-i  —  y.\-i)  0 

g'  (yx  -2)  —  /*'  (a.v  -  2  —  y.v  -  2)  -f  (a  —  b)  h'  (xx  -  1  —  Vx  -  1)  =  0 


g'  (>’/  +  h'  ( x  —  y)  +  (a  —  b )  h'  (a,  —  y.)  +  ...  =  0 . 

upon  taking  partial  derivatives.  However,  in  the  absence  of  this  know¬ 
ledge,  since  we  are  interested  not  in  local  maxima,  but  in  the  absolute 
maximum,  we  must  also  test  the  boundary  values  y,  =  0  and  vi,  and 
all  combinations  of  boundary  values  and  internal  maxima.  Furthermore, 
if  the  solution  of  the  equations  in  (1)  is  not  unique,  we  must  run.  through 
a  set  of  conditions  sufficient  to  ensure  our  having  a  maximum  and  not 
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a  minimum  or  a  mere  local  maximum.  It  is  evident  that  for  problems 
of  large  dimension,  which  is  to  say  for  processes  involving  a  large  number 
of  stages,  a  systematic  procedure  for  carrying  out  this  program  is  urgently 
required  to  keep  the  problem  from  getting  out  of  hand. 

Suppose  that  we  abdicate  as  an  analyst  in  the  face  of  this  apparently 
formidable  task  and  adopt  a  defeatist  attitude.  Turning  to  the  succor 
of  modem  computing  machines,  let  us  renounce  all  analytic  tools. 
Consider,  as  a  specific  example,  the  problem  posed  by  a  10-stage  process. 
Then,  if  we  wish  to  go  about  the  determination  of  the  maximum 
in  a  rudimentary  fashion  by  computing  the  value  of  the  function 
R 10  =  Rio  (y,  yt . y»)  at  suitably  chosen  lattice  points,  we  may  pro¬ 

ceed  to  divide  all  the  intervals  of  interest,  0  y  <,  x,  0  <,  v,  xx,  .... 
0  y,  <;  xt,  into,  say,  ten  parts,  and  compute  the  value  of  /?,„  at  each 
of  the  1010  points  obtained  in  this  manner.  1010  is,  however,  a  number 
that  commands  respect.  Even  the  fastest  machine  available  today  or  in 
the  near  future,  will  still  require  an  appreciable  time  to  determine  the 
solution  in  this  manner. 

To  give  some  idea  of  the  magnitude  of  1010,  note  that  if  the  machine 
took  one  second  for  the  calculation  of  Rl0  at  a  lattice  point,  storage  and 
comparison  wit h  other  values,  the  computation  of  1 010  values  would  require 
2.77  million  hours;  if  one  millisecond,  then  2.77  thousand  hours;  if  one 
micro  second,  then  2.77  hours.  This  last  seems  fairlv  reasonable.  Observe, 
however,  that  if  we  consider  a  20-stage  process,  we  must  multiply  any 
such  value  by  1010,  i.e.,  10*°  =  1010  •  1010. 

Needless  to  say,  there  arc  various  ingenious  techniques  that  can  be 
employed  to  cut  this  time  down.  Nonetheless,  the  method  sketched 
above  is  still  an  unwieldy  and  inelegant  method  of  attack. 

Furthermore,  it  should  be  realized  that  if  we  are  sufficiently  interested 
in  the  solution  of  the  above  decision  process  to  engage  in  computations, 
we  will,  m  general,  wish  to  compute  the  answer  not  only  for  one  particular 
value  of  ,v,  but  for  a  range  of  values,  not  only  for  one  set  of  values  of 
a  and  b  but  for  a  set  of  values,  and  not  only  for  one  set  of  functions 
g  and  h,  but  for  a  class  of  functions.  In  other  words,  we  will  perform  a 
sensitivity  analysis  or  stability  analysis  of  the  solution.  Any  such  sensi¬ 
tivity  analysis  attempted  by  the  above  methods  will  run  into  fairly 
large  computing  times. 

One  of  the  aspects  of  the  situation  viewed  in  these  terms  which  is 
really  disheartening  is  that  this  problem  is,  after  all,  only  the  conse¬ 
quence  of  a  very,  almost  absurdly,  simple  version  of  an  applied  problem. 
It  is  clear  that  any  modification  of  the  problem  in  the  direction  of 
realism,  sav  subdivision  of  v  into  more  than  two  parts,  which  is  to  say 
an  increase  in  the  number  of  activities  we  can  engage  in,  or  an  increase 
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in  the  types  of  resources,  will  increase  the  computing  time  at  an  expo¬ 
nential  rate. 

Furthermore,  as  we  have  pointed  out  in  the  Preface,  we  must  realize 
that  the  essential  purpose  in  formulating  many  of  these  mathematical 
models  of  the  universe,  economic,  physical,  biologic,  or  otherwise,  is  not 
so  much  to  calculate  numbers,  which  are  in  many  cases  of  dubious  value 
because  of  the  lack  of  knowledge  of  some  of  the  basic  constants  and 
functions  involved,  but  rather  to  determine  the  structure  of  the  solution. 
Concepts  arc,  in  many  processes,  more  important  than  constants. 

The  two,  however,  in  general  go  hand-in-hand.  If  we  have  a  thorough 
understanding  of  the  process,  we  have  means,  through  approximation 
techniques  of  various  sorts,  of  determining  the  constants  we  require. 
Furthermore,  in  the  processes  occurring  in  applications,  of  such  enormous 
complexity  that  trial  and  error  computation  is  fruitless,  it  is  only  by 
having  an  initial  toe-hold  on  the  solution  that  we  can  hope  to  use  com¬ 
puting  machines  effectively. 

Going  back  to  the  idea  of  the  intrinsic  structure  of  a  solution,  we  may 
ask  what  it  is  that  we  really  wish  to  know  if  we  are  studying  a  process 

of  this  type.  Naturally,  we  would  like  to  obtain  the  point  (y,  y, . y\) 

at  which  the  maximum  occurs,  and  any  solution  must  furnish  this.  But 
from  the  point  of  view  of  a  person  carrying  out  the  process,  all  that  is 
really  required  at  any  particular  stage  of  the  process  is  the  value  of  y 
in  terms  of  x,  the  resources  available,  and  N,  the  number  of  stages 
ahead;  that  is  to  say,  the  allocation  to  be  made  when  the  quantity 
available  is  x  and -the  number  of  stages  of  the  process  remaining  is  N. 
Viewed  as  a  multi-stage  process,  at  each  stage  a  onc-dimensional  choice 
is  made,  a  choice  of  y  in  the  interval  [0,  x].  It  follows  1  that  there  should 
be  a  formulation  of  the  problem  which  preserves  this  dimensionality 
and  saves  us  from  becoming  bogged  down  in  the  complexities  of  multi¬ 
dimensional  analysis. 

§  4.  Functional  equation  approach 

Taking  this  as  our  goal,  namely  the  preservation  of  one-dimensionality, 
let  us  proceed  as  follows.  We  first  observe  that  the  maximum  total  return 
over  an  A’-stagc  process  depends  only  upon  N  and  the  initial  quantity  x. 
Let  us  then  define  the  function, 

(1)  /.v(v)  —  the  maximum  return  obtained  from  an  A' -stage  process 
starting  with  an  initial  quantity  x,  for  AT  =  1,2,  .... 
and  x  ;>  0. 

*  As  an  application  of  the  useful  principle  of  wishful  thinking. 
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We  havii 

(2)  fs  (x)  =  Max  Rs  (x,  y,  . . .,  y  s-  i),  N  =  2,  3,  . . 

{v.v,} 

with 

(3)  /Ax)  =  Max  [g(y)  +  h(x-y)]. 

0  <  y  <  I 

Our  first  objective  is  to  obtain  an  equation  for  /,  (x)  in  terms  of  fx  (x). 
Considering  the  two-stage  process,  we  see  that  the  total  return  will  be 
the  return  fro  a  the  first  stage  plus  the  return  from  the  second  stage, 
at  which  stage  we  have  an  amount  ay  -f-  b  {x  —  y)  left  to  allocate.  It  is 
clear  that  whatever  the  value  of  y  chosen  initially,  this  remaining  amount, 
ay  +  b(x —  y),  must  be  used  in  the  best  possible  manner  for  the  re¬ 
maining  stage,  if  we  wish  to  obtain  a  two-stage  allocation  which 
maximizes. 

This  observation,  simple  as  it  is,  is  the  key  to  all  of  our  subsequent 
mathematical  analysis.  It  is  worthwhile  for  the  reader  to  pause  here  a 
moment  and  make  sure  that  he  really  agrees  with  this  observation, 
which  has  the  deceptive  simplicity  of  a  half-truth. 

It  follows  that  as  a  result  of  an  initial  allocation  of  y  we  will  obtain 
a  total  return  of  /,  (ay  -f  b  (x  —  y))  from  the  second  stage  of  our  two 
stage  process,  if  y,  is  chosen  optimally.  Consequently,  for  the  total 
return  from  the  two  stage  process  resulting  from  the  initial  allocation 
of  y,  we  have  the  expression 

(4)  R2  (x,  y,  yt)  =  g  (y)  -f  h  (x  —  y)  +  /,  (ay  +  b  (x  —  y)) . 

Since  y  is  to  be  chosen  to  yield  the  maximum  of  this  expression,  we 
derive  the  recurrence  relation 

(5)  ft  (x)  —  Max  g  (y)  -f  h  (x  —  y)  +  /i  (ay  +  b  (x  ~  y))j , 

o  <  y  <- 1 

connecting  the  functions  /,  (x)  and  f2  (x).  Using  precisely  the  same 
argumentation  for  the  N- stage  process,  we  obtain  the  basic  functional 
equation 

'6)  fs  (x)  =  Max  [g  (y)  +  h  (x  —  y)  -f  fs  -  i  (ay  -f-  b  (x  —  y))] 

')  <  y  <  * 

for  N  2,  with  fx  (x)  defined  as  in  (3)  above. 

Starting  with  fx  (x),  as  determined  by  (3),  we  use  (G)  to  compute /2  (x), 
which,  in  turn,  repeating  the  process,  yields  f3  (.v),  and  so  on.  At  each 
step  of  the  computation,  we  obtain,  not  only/t  (x),  but  also  y*-  (x),  the 
optimal  allocation  to  be  made  at  the  beginning  of  a  Ar-stage  process, 
starting  with  an  amount  x. 
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The  solution,  then,  consists  of  a  tabulation  of  the  sequence  of  functions 
{yk  (*)}  and  {/*  ( x )}  for  x  ;>  0,  k  =  1,  2 . 

Given  the  sequence  of  functions  {y*  (a:)},  the  solution  of  a  specific 
problem,  involving  a  given  N  and  a  given  x  has  the  form 

(7)  y  =  y.v  (*). 

Vi  =  y-v  -  i  (ay  +  b(x  —  y)), 
y,  =  y*-2  (ayi  +  *  (*,  —  y*)). 


ys  - 1  —  j’i  (ay.v  -2  +  6  van  -  2  — y  n  -  2)), 

where  (y,  y,,  ....  y.v  -  1)  is  a  set  of  allocations  which  maximizes  the  total 
A-stage  return. 

A  digital  computer  rnay  be  programmed  to  print  out  the  sequence  of 
values  y,  y,,  ....  y.v  -  1,  in  addition  to  tabulating  the  sequences  {/*  (x)} 
and  {>-*(*)}. 

§  5.  Discussion 

The  important  fact  to  observe  is  that  we  have  attempted  to  solve  a 
maximization  problem  involving  a  particular  value  of  x  and  a  particular 
value  of  N  by  first  solving  the  general  problem  involving  an  arbitrary 
value  of  x  and  an  arbitrary  value  of  N.  In  other  words,  as  we  promised 
in  the  first  section,  we  have  imbedded  the  original  problem  within  a 
family  of  similar  problems.  We  sha'l  exploit  this  basic  method  of  mathe¬ 
matical  analysis  throughout  the  book. 

What  are  the  advantages  of  this  approach?  In  the  first  place,  we  have 
reduced  a  single  A'-dimensional  problem  to  a  sequence  of  N  one¬ 
dimensional  problems.  The  computational  advantages  of  this  formulation 
are  obvious,  and  we  shall  proceed  in  the  next  sections  to  show  that  there 
are  analytic  advantages  as  well,  as  might  be  suspected.  As  we  shall  see, 
we  will  be  able  to  obtain  explicit  solutions  for  large  classes  of  functions 
g  and  h,  which  can  be  used  for  approximation  purposes.  This  point  will 
be  discussed  again  below.  Furthermore,  we  will  be  able  to  determine 
many  important  structural  features  of  the  solution  even  in  those  cases 
where  we  cannot  solve  completely.  The  utilization  A  structural  properties 
of  the  solution  and  the  reduction  in  dimension  combine  to  furnish 
computing  techniques  which  greatly  reduce  the  time  required  to  solve 
the  original  problem.  We  shall  return  to  this  point  in  connection  with 
some  multi-dimensional  versions. 
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§  6.  A  multi-dimensional  maximization  problem 

Before  proceeding  to  a  more  detailed  theory  of  the  processes  described 
above,  let  us  digress  for  a  moment  and  briefly  present  two  further 
applications  of  the  general  method. 

For  the  first  application,  consider  the  problem  of  determining  the 
maximum  of  the  function 

A’  • 

(1)  F  (x„  xt,  . .  .,xN)  ~  E  gi  (xi)  , 

i  -  1 

over  the  region  defined  by 

(2)  (a)  *,  -f  xt  -f  . . .  -f  xN  =  c  , 

(b)  xt  ^  0. 

Each  function  (x)  is  assumed  to  be  continuous  for  all  x  0. 

Since  the  maximum  of  F  depends  only  upon  c  and  N,  let  us  define 
the  sequence  of  functions 

(3)  fN  (c)  —  Max  F  (*,,  xtl  ,  .  . ,  xN) , 

{',} 

for  c  ^  0  and  AT  =  1 ,  2,  .  .  .  . 

Then,  arguing  as  above,  we  have  the  recurrence  relation 

(4)  /a-  <c)  =:  Max  [g.v  (,v)  +  fs  -  i  [c  —  x)] , 

0  <  x  <  c 

for  N  =  2,  3,  .  . . ,  with 

(5)  /i  (c)  =  Si  ( c ) . 

§  7.  A  “smoothing”  problem 

As  the  second  application,  let  us  consider  the  problem  of  determining 
the  sequence  {**■}  which  minimizes  the  function 

.v  A' 

(1)  F  (xlt  xt,  . . .,  xN)  =  E  Sk(xk  —  r»)  4-  E  hk{xk  —  xk-\). 

k  -  1  k  -  1 

Here  { rk }  is  a  given  sequence,  x0  =  c  a  given  constant,  and  we  assume 
that  the  functions  gk  (x)  and  hk  (x)  are  continuous  for  all  finite  x,  and 
that  gk  ( x ),  hk  (x)  ->-oo  as  j  x  |  — >■  oo. 

The  genesis  of  this  problem,  explaining  its  name,  will  be  discussed 
in  the  exercises. 
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Let  us  define  the  sequence  {/«  (c)},  R  —  1,2 . N,  by  the  property 

that  fn(c)  is  the  minimum  over  all  xr,  xr  +  i,  . . x*  of  the  function 


(2) 


.v 


A 


E  gk  (xk  —  rk)  +  £  hk 

-  R  k  -  R 


(Xk  —  Xk-  l). 


where  xr  _  1  =  c. 


We  have 


(3)  fs  (c)  =  Min  [g.v  (x  —  rN)  -f  hN  (x  —  c )] , 

X 

and 

(I)  /*  (c)  =  Min  [g*  (x  —  rR)  +  hR(x  —  c)  +/«  +  i  (*)] , 

X 

for  R  =  1.  2 . N—  1. 

§  8.  Infinite  stage  approximation 

Let  us  now  return  to  the  allocation  process.  The  treatment  we  present 
here  serves  as  a  prototype  for  the  discussion  of  a  number  of  multi-stage 
processes,  of  diverse  origin,  but  s.milar  analytic  structure. 

If  iV  is  large,  it  is  reasonable  to  consider  as  an  approximation  to  the 
A'-stage  process,  the  infinite  stage  process  defined  by  the  requirement 
that  the  process  continue  indefinitely.  Although  an  unbounded  process 
is  always  a  physical  fiction,*  as  a  mathematical  process  it  has  many 
attractive  features.  One  immediate  advantage  of  this  approximation 
lies  in  the  fact  that  in  place  of  the  sequence  of  equations  given  by  (4.6), 
we  now  have  the  single  eq  ution 

(1)  f(x)  =  Max  [g  (y)  +  h  (x  —  y)  +  /  (ay  +  b  [x  —  y))] 
o  <  y  <  j- 

satisfied  by /  (x),  the  total  return  of  the  process,  with  a  single  allocation 
function  y  =  y  (*),  determined  by  the  equation. 

To  balance  this,  we  encounter  many  of  the  usual  difficulties  associated 
with  infinite  processes.  It  is,  first  of  all,  no  longer  clear  that  a  maximum 
exists  rather  than  a  supremum.  This  is  to  say,  there  may  be  nc  allocation 
policy  which  actually  yields  the  total  return  f{x).  Furthermore,  if  we 
wish  to  employ  (I)  in  an  unrestricted  fashion  to  determine  properties 
of  the  infinite  process,  we  must  show  that  it  possesses  no  extraneous 
solutions.  In  other  words,  we  must  establish  existence  and  uniqueness 
theorems  if  this  equation  is  to  serve  a  useful  purpose. 

3  We  shall  occasionally  use  the  word  “physical”  to  describe  the  “real”  world. 
It  should  be  interpreted  to  mean  economic,  biological,  engineering,  etc  ,  depending 
upon  the  background  and  interests  of  the  reader. 
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§  9.  Existence  and  uniqueness  theorems 

The  result  we  obtain  in  this  section  is  actually  a  special  case  of  a 
more  general  result  we  shall  derive  in  a  later  chapter.  Repetition,  however, 
no  matter  how  dismaying  as  a  social  or  literary  attribute,  is  no  great 
mathematical  sin,  and  it  is  important  to  present  the  simpler  case  first, 
enabling  the  basic  ideas  to  appear  unimpeded  by  technicalities  of  lesser 
import. 

Let  us  now  demonstrate 
Theorem  1.  Let  us  assume  that 

(1)  a.  g  (x)  and  h  (*)  are  continuous  functions  of  x  for 

x^O,g(0)=h  (0)  =  0. 

b.  If  m  (x)  =  Max  Max  (  |  g  (y)  |,  |  h  (y)  j ) ,  and 

0  <  y  <  i 

co 

c  =  Max  (a,  b),  then  2,  m  (en  x)  <  oo  for  all  x  ^  0. 

n  —  0 

c.  o<;a<;i,o<;6<i. 

Under  these  assumptions,  there  is  a  unique  solution  to  (8.1)  which  is  con¬ 
tinuous  at  x  —  0,  and  has  the  value  0  at  this  point;  moreover,  this  function 
is  continuous. 

Before  proceeding  to  the  proof,  let  us  digress  for  a  moment  and 
consider  the  important  special  case  where  g  and  h  are  both  non-negative. 
The  sequence  {/,v  (x)}  as  given  by  (4.6)  is  a  monotone  increasing  sequence, 
with  boundedness  a  consequence  of  condition  (lb),  as  we  shall  show 
below  in  a  moment.  Consequently,  for  all  x  ;>  0,  /jv  (x)  converges  to  a 
function  /  (x)  as  N  — >  oo. 

Let  us  show  that  this  function  satisfies  the  equation 

(2)  /  (a)  -=  Sup  [g  (y)  +  h  (x  —  y)  +  /  (ay  +  b  (x  —  y))] . 

o  <  v  <  x 

To  simplify  our  notation,  let  us  set 

(3)  T  (/,  y)  =  g  (y)  +  h  (x  —  y)  +  f(ay  +  b  (x  —  y)) . 

The  basic  recurrence  relation  is  then 

(4)  /v  +  i(*)=  Max  T(fs,y). 

0  <  y  <  i 
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From  (4)  we  obtain  as  a  consequence  of  the  monotonicity  in  N, 

(5)  /(*):>  Max  T  {/n,  y)  ■ 

OinSi 

For  any  y  in  the  interval  [0,  x],  this  means  that  the  inequality 

(6)  f{x)^T(fN,y) 
holds.  Letting  N  —>  oo,  this  yields 

(7)  f(x)^T(/,y) 

for  all  y  in  [0,  x],  which,  in  turn,  leads  to  the  result 

(8)  j(x)  ^  Sup  T(f,y). 

0  <  y  <  x 

We  cannot  write  Max  since  we  have  no  guarantee  that  the 

o  <  y  <  x 

limit  function  /  ( x )  is  actually  continuous  as  a  function  of  x. 

On  the  other  hand,  from  (4)  we  also  obtain 

(9)  /v  +  i(*)^  Sup  T  (f,  y) , 

0  <  V  <  x 

for  all  N,  and  thus 

(10)  /(*)<;  Sup  T(f,y). 

o  <  v  <  j 

Comparing  (8)  and  (10),  we  obtain  (2). 

One  of  the  defects  of  this  proof  based  solely  upon  monotonicity  is 
that  it  does  not  yield  the  continuity  of  the  limit  function,  a  result  which 
implies  the  existence  of  an  optimal  policy.  This  optimal  policy  is  a 
function  y  ( x )  which  yields  the  maximum  in 

(11)  /(*)=  Max  T  (/,  y) , 

U  <  V  <  x 

when  the  maximum  exists. 

The  existence  of  an  optimal  policy  for  the  infinite  process  is  directly 
of  no  particular  importance  computationally,  or  as  far  as  applications 
are  concerned.  It  is,  however,  of  great  importance  in  connection  with 
the  determination  of  the  structure  of  optimal  policies  for  the  infinite 
process.  Thus,  indirectly,  the  question  of  the  existence  of  continuous 
solutions  is  significant  as  far  as  numerical  results  are  concerned,  since 
the  solution  of  the  infinite  process  can  be  used  as  an  approximation  to 
the  solution  of  the  finite. 

In  order  to  establish  the  existence  and  uniqueness  of  a  •continuous 
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solution  of  (11),  we  shall  employ  a  technique  that  is  applicable  to  a  large 
class  of  equations  of  this  type,  the  method  of  successive  approximations. 
We  shall,  however,  encounter  monotonicity  arguments  again  in  later 
chapters. 

Turning  to  the  recurrence  relations  in  (4),  let  us  begin  with  the  obser¬ 
vation  that  fx  (x)  is  continuous  for  all  x  ;>  0  by  virtue  of  the  assumptions 
made  concerning  g  ( x )  and  h  ( x ).  It  follows,  inductively,  that  each 
element  of  the  sequence  {/y(*)}  is  continuous.  It  is  worth  pointing 
out,  however,  that  he  location  of  the  maximizing  y  need  not  depend 
continuously  upon  x.  In  other  words,  the  policy  is  not  necessarily  a 
continuous  function  of  x.  An  example  of  this  is  given  in  §  15. 

Let  ys  ( x )  be  a  value  of  y  which  yields  the  maximum  in  (4).  It  is  a 
matter  of  indifference  as  to  which  value  of  y  we  choose,  if  there  is  more 
than  one  value  producing  the  maximum.  Then  we  have 

(12)  fs  +  i(x)  =  T(fs,  yN). 

fs  +  2  (x)  =  T  (fs  + 1,  yjv  +  i)  , 

and,  as  a  consequence  of  the  maximum  property  of  the  ys,  the  further 
inequalities 

(13)  JN  ♦!(*)-  T  (fs,  ys)  ^  T  ( fs ,  y.v  +  i) 

fs  +  2  (x)  =  T  {fs  +  i,  y.v  +  i)  ;>  T  [fs  +  l,  ys) . 

These,  in  turn,  yield 

(14)  fs  +  i  (x)  — fs  +  2  (x)  ;>  T  (fs.  ys  +  i)  —  T  (fs  +  i,  ys  +  i) 

( fs,  ys)  —  T  (fs  +  i,  ys) 

The  two  inequalities  combined  yieldf  the  important  estimate 

(15)  |  fs  +  1  (x)  —  fs  +  2(x)  I  ^  Max  [  |  T  (fs,  y.v  +  i)  —  T  (fs  +  i,  y.v  +  i)  |, 

!  7'  (/.v,  ys)  —  T  (fs  +  i,  y.v)  |J. 

Turning  to  the  definition  of  T  (/,  y)  given  in  (3),  we  see  that 

(16)  I  T  (fs,  y.v)  —  T  (fs  +  i,  y.v)  | 

=  j  fs  (ays  +  b(x  —  yN))  — fs  +  l  (ays  +  b  (x  —  y.v))  | 

Let  us  now  define 

(17)  w.v  (*)  =  Max  |  fs  (z)  —fs  +  i  (z)  |  ,  A’  =  1,2,... 

0  <  z  <  x 
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Since  ay  -f  b  (x  —  y)  ^  cx  for  all  y  in  [0,  x],  the  relation  in  (16)  yields 

(18)  uN  +  i  (*)  <;  us  -,-x) . 

It  remains  to  estimate  ut  ( x ).  We  have,  referring  to  the  equations 
for /,  ( x )  and  /,  (x),  the  relation 

(19)  I /» (*)  —ft  (x)  !  ^  Max  [  |/,  (ay,  +  b  (x  —  yt)  |, 

|/i  («yt  +  *  (*  —  y»)  |  ]  ^  rn  (cx), 

using  the  definition  of  tn  (x)  given  in  (lb). 

Hence  we  see  that  «,  (x)  m(cx),  and  thus,  using  (17),  that  us  (x) 
<.  m  (cJV  +  1  x).  By  virtue  of  our  assumption  concerning  m  (x)  it  follows 

00 

that  u\  (x)  converges  for  all  x,  and  what  is  important,  uniformly  in 
%  -  1 

any  finite  interval.  The  limit  function  /  (x)  =  lim  fs  (x),  in  consequence, 

A'  — *>  00 

exists  and  is  continuous  for  all  x.  Furthermore,  the  uniformity  of  con¬ 
vergence  ensures  that  f  (x)  is  a  solution  of  (6.1). 

It  remains  to  establish  the  uniqueness  of  the  solution.  Let  F  (x)  be 
any  other  solution  which  exists  for  all  x  and  is  continuous  at  x  =  0, 
with  F  (0)  =  0. 

In  the  equation 

(20)  f(x)=  Max  T  (/,  y) , 

0  <  y  <  1 

let  y  —  y  (x)  be  a  value  of  y  which  yields  the  maximum,  and  let  w  =  w  (x) 
play  the  similar  role  jn 

(21)  F  (x)  =  Max  T  (F,  y) . 

»  <  y  <  1 

Then,  as  above,  we  obtain  the  two  inequalities, 

(22)  f(x)  ^  T  (f,  y)  ^T(f.w) 

F(x)  »  T(F,w)  ^  T  (F,  y) , 

and,  as  before,  this  leads  to 

(23)  |/(*)-F  (*)!<;  Max  T  (/,  y)  —  T  (F,  y)  |,  |  T  (/,  w)  —  T  (F,  w)  |] . 

^  Max  [  |/(ay  +  b  (x  —  y) )  —  F  (ay  +  b(x  —  y))  |, 

J[aw  +  b  (x  —  ip)  )  — F  (aw  +b(x  —  it-))  |  ] . 


;<(*)=  Sup  |/(2)  —  F  (z)  | . 

0  <  z  <  z 


Let  us  now  define 
(24) 
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Since  /  ( x )  is  continuous  for  all  x  ;>  0  and  F  ( x )  is,  by  assumption,  con¬ 
tinuous  at  x  —  0,  we  see  that  u  (x)  is  continuous  at  x  =  0  and  has  the 
value  0  there. 

From  (23)  we  obtain 

(25)  u{x)<.u  (cx) , 
whence,  by  iteration, 

(26)  u(x)<,u  (cN  x) , 

for  all  N  ;>  1.  Since  u  {x)  is  continuous  at  x  =  0  and  u  (0)  =  0,  upon 
letting  N  ->  oo  we  obtain  u  ( x )  <,  0,  and  thus  that  /(*)  —  F  (x).  This 
completes  the  proof  of  the  existence  and  uniqueness  of  a  solution  of  the 
functional  equation  associated  with  the  infinite  process. 

§  10.  Successive  approximations 

In  considering  the  equation 
(1)  /(*)•=  Max  T  (f,  y) , 

0  <  y  <  z 

we  have  shown  that  a  particular  sequence  of  successive  approximations 
converged  to  the  unique  solution  which  is  continuous  at  x  —  0  and 
zero  there.  It  is  important  for  both  analytic  and  computational  purposes 
to  know  that  actually  any  sequence  whose  initial  function  satisfies 
certain  simple  requirements  converges. 

The  methods  we  have  used  above  may  also  be  employed  to  prove  the 
following 

Theorem  2.  Let  /0  (x)  satisfy  the  following  conditions: 

(1)  a.  f0(x)  is  continuous  for  x  I>0. 

b.  /0  (0)  =  0. 

Then,  if  the  conditions  of  Theorem  1  are  fulfilled,  the  sequence  defined  by 

(2)  fs  +  i  (*)  =  Max  T  (fs,  y),N  =  0, 1 . 

0  <  y  <  i 

converges  to  the  solution  f  (x)  obtained  above,  uniformly  in  any  finite  interval. 

§  11.  Approximation  in  policy  space 

We  have  employed  above  the  classical  technique  of  successive  ap¬ 
proximations  in  order  to  obtain  a  solution  to  the  nonlinear  functional 
equation 

(1)  f{x)=  Max  T  (/,  y) . 

'  — .  0  <  V  <  z 
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We  now  wish  to  exploit  a  certain  duality  which  is  present  in  these 
decision  processes  to  show  that  we  can  choose  the  initial  approximation 
in  such  a  way  that  we  can  always  ensure  this  approximation  being 
monotone.  This  means  that  we  have  uniformly  better  convergence  with 
each  iteration. 

Let  us  begin  by  introducing  some  terminology.  We  shall  call  a  sequence 
of  allocations;  i.e.,  a  sequence  of  admissible  choices  of  y,  a  policy,  and 
a  policy  which  yields /(x)  an  optimal  policy. 

The  duality  that  exists  in  the  theory  of  dynamic  programming  arises 
from  the  interconnection  between  the  functions  J  (x)  which  measure  the 
maximum  return  and  the  policies  which  yield  these  maximum  returns. 
Actually  a  policy  is  a  function,  since  a  policy  is  a  determination  of  y  as 
a  function  of  x.  It  is  worthwhile  nonetheless  to  preserve  this  terminology 
since  it  possesses  certain  advantages  derived  from  intuition.  If  the  policy 
is  not  unique,  y  will  not  be  a  single- valued  function  of  x. 

It  follows  from  the  functional  equation  that  a  knowledge  of  f(x) 
yields  y(x),  and  conversely  any  y  (x)  determines  /(*),  iteratively  by 
means  of  the  functional  equation 

(2)  /(*)  =  T(f,y(x)). 

Thus,  for  example,  if  the  optimal  policy  consisted  of  the  choice  y  —  0 
continually,  /  ( x )  would  satisfy  the  functional  equation 

(3)  f(x)=h(x)  +f(bx), 
which  would  yield  the  result 

(4)  /(*)=»£*(&»*). 

n  -  0 

As  we  have  mentioned  above,  the  purpose  of  our  investigation  is  not 
so  much  to  determine  f[x),  which  is  really  a  by-product,  but  more 
importantly,  to  determine  the  structure  of  the  optima!  policy,  which 
is  to  say  to  determine  y  ( x ). 

This  leads  to  an  important  and  useful  idea.  Just  as  we  can  approximate 
in  the  space  of  the  functions  f{x),  so  we  car.  approximate  in  the  space 
of  policies,  y  (x).  Furthermore,  in  many  ways,  this  is  a  more  natural  and 
simpler  form  of  approximation.  The  advantage  of  this  type  of  approxi¬ 
mation  analytically  is  that  it  always  leads  to  monotone  approximations. 
From  the  standpoint  of  applications,  it  is  by  far  the  more  natural 
approximation  since  it  is  usually  the  one  part  of  the  problem  about 
which  a  certain  amount  is  known  as  a  result  of  experience. 

Let  y0  (*)  be  an  initial  guess  for  an  optimal  policy  and  let  f0  (x)  be 
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the  return  function  derived  from  this  policy  function,  which  is  to  say 
that  /» ( x )  satisfies  the  functional  equation 

(5)  f»  (*)  =  T  ( f0 .  yo  (*)). 

an  equation  which  we  solve  iteratively.  To  improve  y0  (x),  we  determine 
y,  (x)  as  a  function  of  x  which  maximizes  T(f0,y)  for  0  <,y<.x. 
Assume  for  the  moment  that  y,  (x)  is  itself  continuous  in  x,  (which  need 
not  necessarily  be  the  case),  and  that  the  return  function /,  (x)  computed 
using  this  policy  is  also  continuous.  This  will  always  be  the  case,  as  we 
point  out  again  below,  under  the  assumptions  we  have  made.  We  now 
continue  in  this  way,  generating  a  sequence  of  policies,  {yjv  (*)},  and 
a  sequence  of  return  function,  {fs  (*)}. 

It  is  easy  to  show,  utilizing  the  methods  described  in  the  foregoing 
sections,  that  under  the  assumptions  we  have  made  the  sequence  {fs  (*)} 
is  monotone  increasing.  A  rigorous  proof  of  the  existence  and  con¬ 
vergence  of  the  sequences  {y.v  (*)}  and  {fs  (*)}  described  above  seems 
difficult  to  obtain.  Consequently,  we  compromise  for  the  following. 

Theorem  3.  Let  f0  (x)  be  the  result  of  an  initial  approximation  in  policy 
space,  that  is, 

(6)  /.  (x)  =  T  (/„  y„  <*)) , 

where  y0  (x)  is  any  continuous  function  of  x  satisfying  the  conditions 

(7)  0^yo  (x)  x . 

Under  the  assumptions  of  Theorem  1 ,  the  sequence  defined  by 

(8)  fs  ♦!(*)=  Max  T  {fN,  y),  N  =  0,  1,  2 . 

(t  <  y  <  j- 

converges  uniformly  to  the  solution  f  (x)  obtained,  and  this  convergence  is 
monotone. 

Proof.  Let  us  demonstrate  the  monotonicity,  which  is  the  essential 
feature,  first.  We  have 

W  /,w  =  Max  y). 

<•  <  v  <  * 

Comparing  the  definition  of  /„  given  in  (5)  with  the  definition  of  fx  above, 
we  see  that  fx  >/0  for  all  values  of  x.  From  this  it  follows  inductively 
that  fs  +  l  ^>/.v  for  all  values  of  x  ;>0. 

It  remains  to  prov^  the  continuity  of  the  function  /0  (x)  for  x  0. 
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The  conditions  upon  g  and  h  which  we  have  imposed  above  show  that 
the  formal  series  for  /,  (x) 

(10)  /,  (x)  =  g  (y.)  +  h  (x  —  y„)  +  .... 

obtained  iteratively,  converges  uniformly  in  any  finite  interval  and 
represents  a  continuous  function  of  x  for  x  ;>  0,  if  y,  (x)  is  a  continuous 
function  of  x. 

§  12.  Properties  of  the  solution — I:  Convexity 

Let  us  now  show  that  we  can  derive  certain  structural  properties  of 
the  optimal  policy  from  various  simple  structural  properties  of  the 
functions  g  and  h.  The  structure  of  the  optimal  policy  y  (x)  and  that  of 
the  return  function  /  (x)  turn  out  to  be  intimately  entwined. 

Our  first  result  in  this  direction  is 

Theorem  4.  If,  in  addition  to  the  assumptions  in  Theorem  1,  we  impose 
the  conditions  that  g  and  h  be  convex  functions  of  x,  then  f  (x)  will  be  a 
convex  function,  and  for  each  value  of  x,  y  will  equal  0  or  x. 

Proof.  The  proof  will  be  inductive.  Since 

(!)  h  M  =-'  Max  (g  (y)  +  h  (x  —  y)) 

o  <  v  <  » 

and  g  (y)  -f  h  (x — y)  is  convex  as  a  function  of  y  for  0  ^  y  ^  x,  it 
follows  that 

(2)  /,  (x)  =  Max  (g  (x),  h  (x)  ) , 

since  the  maximum  of  a  convex  function  must  occur  at  one  of  the 
end-points.  As  the  maximum  of  two  convex  functions,  fx  (x)  is  convex. 

Since  g  (y)  +A(x  —  y)  +  /,  (ay  -f  b  (x  —  y))  is  a  convex  function  of 
y  for  y  in  [0,  x]  it  follows  by  repetition  of  the  above  argument  that 

(3)  /,  (x)  =  Max  (g  (x)  -f  /,  (ax),  h  (x)  -f  /,  (bx)) , 

is  a  convex  function  of  x.  We  see  then,  inductively  that  f\  (x)  is  convex, 
and  thus  that  the  limit  function  f  (x)  is  convex. 

Turning  to  the  equation  f  (x)  =  Max  T(f,y),  the  convexity  of  / 

o  <  y  <  r 

reduces  this  to  the  simpler  equation 

(4)  /  (x)  =  Max  (g  (x)  +  /  (ax) ,  h  (x)  +  f  (bx)) , 

showing  that  y  =  0  or  x  for  each  value  of  x.  This  equations  is,  sur- 
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prisingly,  still  a  difficult  equation  to  solve  in  general.  We  shall  consider 
a  particular  case  of  it  below. 

§  IS.  Properties  of  the  solution — II:  Concavity 

Let  us  now  demonstrate  that  an  analogous  result  holds  for  the  case 
where  g  and  h  are  both  strictly  concave  functions  of  x  for  x  ;>  0. 

Theorem  5.  If,  in  addition,  to  the  assumptions  in  Theorem  1,  we  impose 
the  conditions  that  g  and  h  be  strictly  concave  functions  of  x,  then  f  (x) 
will  be  a  strictly  concave  function  of  x. 

In  this  case,  the  optimal  policy  will  be  unique. 

Proof.  Let  us  consider  the  one-stage  case  first,  and  perform  some 
simple  calculations  which  will  show  us  why  the  result  should  be  true, 
before  proceeding  to  a  rigorous  proof  using  a  different  and  more  general 
technique. 

We  have 

(1)  /,  (x)  =  Max  [g  (y)  -f  h  (x  —  y)] . 

Since  g  and  h  are  strictly  concave  functions,  the  function  g  (y)  4-  h  (x  ■  —  y) 
is  a  strictly  concave  function  of  y.  There  is,  in  consequence,  a  single 
maximum,  which  may,  nonetheless,  occur  at  an  end  point  y  =  0  or 
y  —  x.  Let  us  suppose  for  the  moment  that  it  occurs  at  an  interior  point, 
and  that  g  and  h  possess  second  derivatives.  Then, 

(2)  /,  (x)  =  g  (y)  -f  h  {x  —  y) 

where  y  is  determined  as  a  function  of  x  by  means  of  the  relation 

(3)  g'(y)  =  h'(x-y). 

Differentiation  of  (2)  yields 

W  /»'  (*)  =  (g’  (y)  —  h'  (x  —  y))  dy/dx  +  h'  (x  —  y)  =  h'  (x  —  y), 
and  thus 

(5)  //  (*)=/»'  [x-y){\- dy/dx). 

Differentiating  the  relation  in  (3),  we  obtain 

(6)  g"  (y)  dy/dx  —  h’  (x  —  y)  (1  —  dy/dx) , 
which  yields 

CO  dy/dx  =  h"  (x  -  v)l(r"  (y)  4-  /»'  (x  -  y)) . 
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This  shows  that  1  >  dy/dx  >  0,  and  thus,  returning  to  (5),  that /,'  ( x )  <  0. 

If  the  maximum  is  not  actually  inside,  we  can  force  it  to  be  by  various 
modifications  of  the  functions  g  and  h  which  prevent  the  maximum  from 
ever  being  at  y  =  0  or  y  =  x ;  e.g.,  by  addition  of  a  term  e  log  y  (x  —  y), 
where  e  is  a  small  positive  quantity.  We  can  then  proceed  induc¬ 
tively  and  establish  the  same  result  for  all  the  members  of  the 
sequence  /n  (x).  This  is,  however,  a  rather  clumsy  method  which  does 
not  extend  without  pain  to  multi-dimensional  problems.  We  shall 
therefore  use  a  more  elegant  and  simple  method. 

Lemma  1 .  If  G  ( x ,  y)  is  a  concave  function  4  of  x  and  y  for  x,  y  ^  0,  then 
f(x)  as  defined  by 

(8)  f{x)  =  Max  G(x,y) 

O  <  y  <  z 

is  a  concave  function  of  x  for  x  0. 

Proof.  We  have,  for  0  <,  A  <.  1 , 

(9)  /(Ar  +  (1-A):)=  Max  G  (A  *  -+  (1  —  A)  z,  y) . 

0<y<A*  +  — 

We  may  replace  y  by  the  quantity  y  =  A  y,  +  (1  —  A)  yt  where  y,  and 
y,  range  independently  over  the  intervals  0  <;  y,  <.  x,  0  <;  y,  <;  z.  Then 

(10)  /(Ax  +  (1  —  A)z)  =  Max  C(A*  +  (1  —  A)  z,  A  y,  +  (1  — A)y,) . 

o  <  Vi  <  * 

0  <  V.  <  * 

Since  G  (x,  y)  is  concave  in  x  and  y,  we  have 

(11)  G(Ax  +  (1  —  A)z,  Ay,  +  (1  —  A)  y.)  ^AG(r,  y,)  +  (1  —  A)  G(z,yt) 
Hence 

(12)  /(A  x  -f  (1  — A)  z)  Max  [AG  (x,  y,)  -f  (1  — A)  G  (z,  y,)] 

o  <  ».  <* 

0  <  V.  <  I 

;>  A  Max  G(x,  y,)-f  (1 — A)  Max  G(r,  y,) 

0  <  y,  <r  0  S  V,  <  i 

->A/(x)  +  (l-A)/(r). 

Let  us  now  apply  this  lemma  to  prove  Theorem  5.  It  is  easily  verified 
that  g  (y)  +  h  (x  —  y)  is  a  concave  function  of  x  and  y  if  g  and  h  are 
concave  functions.  This  shows  immediately  that  fx  (x)  is  concave.  Simi¬ 
larly,  since  fx  (ay  -f  b  (x  —  y))  is  a  concave  function  of  x  and  y,  /,  (x) 

4  Concavity  in  both  x  and  y  means  the  G  (X  x,  -)-  (1  —  A)  xr  Ay,  +  (I  —  A)  >',)  >  A 
C  (x„  y,)  +  (l  —  A)  G  (.Tj,  y„),  for  0  <  A  <  I. 
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as  defined  by  the  basic  recurrence  relation  is  a  concave  function.  We 
thus  proceed  inductively  and  show  that  each  function  in  the  sequence, 
{/n  (*)}.  *s  a  strictly  concave  function,  and  hence  that  the  limit  function 
is  concave.  That  it  is  strictly  concave  follows  from  the  strict  concavity 
of  g  and  h,  using  Lemma  1  upon  the  functional  equation  for  /  ( x ). 

Once  we  have  established  the  strict  concavity  of  f  (x),  the  uniqueness 
of  the  maximizing  y  and  thus  of  the  optimal  policy  follows  immediately. 
This  completes  the  proof  of  Theorem  5. 

§  14.  Properties  of  the  solution — III:  Concavity 

Let  us  now  show  that  the  assumption  of  concavity  enables  us  to  tell 
quite  a  bit  more  about  the  nature  of  the  solution. 

Theorem  6.  Let  us  assume  that 

(1)  a.  g  (x)  and  h  (x)  are  both  strictly  concave  for  x  2>  0,  monotone  in¬ 

creasing  with  continuous  derivatives  and  that  g  (0)  =  h  (0)  =  0. 

b.  g'  (0)1(1  -a)>  h'  (0)1(1  -  b),  W  (0)  >  g'  (oo).  b  >  a. 

Then  the  optimal  policy  has  the  following  form: 

(2)  a.  y  =  x  for  0  x  *,  where  x  is  the  root  of  h'  (0)  =  g'  (x) 

+  (b— a)  g'  lax)  +  (b  — a)  ag’  (a*  x)  -f  ... 

b.  y  —  y  (x)  for  x  ;>  x  where  y  (x)  ts  a  function  satisfying  the  in¬ 
equalities  0  <  y  (x)  <  x,  and  y  (x)  is  the  solution  of 

3)  S'  (y)  —  h'  (x  —  v)  +  (a  —  b)f  (ay  +  b  (x  —  y))  =  0. 

Remark.  We  have  given  the  solution  for  onlv  one  of  the  possible 
combinations  of  inequalities  connecting  g'  (0),  h'  (0),  b  and  a.  It  will  be 
easily  seen  from  the  procedure  below,  that  corresponding  results  hold 
for  the  other  cases.  Furthermore,  the  number  of  cases  can  be  halved 
by  the  observation  that  the  interchange  of  y  and  x  —  v  results  in  an 
interchange  of  a  and  b. 

Proof.  Let  us  employ  the  method  of  successive  approximations.  Set 
(4)  /,  (x)  =  Max  [g  (y)  +  h(x  —  y)] . 

0  <  y  <  x 

Since,  by  assumption,  g'  (0)  >  h'  (0),  for  small  x,  we  have  g'  (y)  — 
W  (x  —  y)  >0,  for  y  in  the  interval  [0,  jc].  Hence  g  (y)  -f-  h  (x  -  y)  is 
monotone  increasing  in  0  y  x  and  the  maximum  occurs  at  y  =  x. 
As  x  increases,  the  equation  g'  (y) — h'  (x  —  y)  =  0  will  ultimately 
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have  a  root  at  y  =  x,  and  then  as  x  increases  further  a  root  inside  the 
interval  [0,  x].  The  critical  value  of  x  is  given  as  the  solution  of  g'  (x)  — 
h'  (0)  —  0.  This  equation  has  precisely  one  solution,  which  we  call  x,. 
For  x  ^  xt  let  y1  =  y,  ( x )  be  the  unique  solution  of  g'  (y)  =  h'  (x  —  y). 
The  uniqueness  of  solution  is  a  consequence  of  the  concavity  assumptions 
concerning  g  and  h ,  and  the  existence  of  a  solution  is  a  consequence  of 
the  continuity  of  g'  and  h'. 

Thus  we  have 

(5)  /i  (x)  =  g  (x).  0<S*<£  xu 

=  g(y i)  +  h(x  —  y,),  X  ^  xt. 

and 

(6)  fi  (*)  =  g'  (x).  0  ^  x  <  x, 

=  [g'  (y.)  —h'(x  —  y,)J  dyjdx  -f  h'  (x  —  y,)  =  h'  (x  —  y,), 
for  x  >  xt. 

Since  y,  (x,)  =  x,,  we  see  that  /,'  (x)  is  continuous  at  x  =  xu  and  hence, 
for  all  values  of  x  ;>  0.  Furthermore  fx  (x)  is  a  concave  function  of  x; 
cf.  the  analysis  of  §  11. 

Now  let  us  turn  to  the  second  approximation 

(7)  /»(x)=  Max  [g(y)  +  A  (x  — y)  +/,(ay -f  6(x  — y))]. 

<•  <  y  <  r 

The  critical  function  is  now  D  (y)  —  g'  (y)  —  h'  (x  — •  y)  -f-  /,'  (ay  + 
b  (X  -  y))  (a  —  b).  Since  g'  (0)  —  K  (0)  +  /,'  (0)  (a  -  b)  =  g’  (0)  -  h'  (0) 
+  S'  (0)  (a  —  b)>  h’  (0)  [{(1  —  a)  (1  +  a  -  b)/(  1  -  b)}  -  1]  >  0,  we 
sec  that  D  (y)  is  again  positive  for  all  y  in  [0.  x]  for  small  x.  Hence  the 
maximum  occurs  in  (7)  at  y  =  x  for  small  x.  As  x  increases,  there  will 
be  a  first  value  of  x  where  D  (x)  =  0.  This  value,  xt,  is  determined  by 
the  equation  g'  (x)  =  h'  (0)  ~j -  (b  —  a)  fx  (ax).  Comparing  the  two 
equations 

(8)  g'  (x)  =  h'  (0) 

g'  (x)=h'(0)  +  (b  —  a)fx'  (ax), 

we  see  that  0  <  x,  <  x,. 

Hence  the  equation  for  x,  has  the  simple  form 

(9)  g'  (x)  =  h'  (0)  +  (b  —  a)  g'  (ax). 

Thus  y  =  x  for  0  <;  x  <;  x2  in  (7)  and  y  =  y2  (x)  for  x  ;>  x2,  w'here 
y2  (x)  is  the  unique  solution  of 

(10)  g'  (y)  =  h'  (x  —  y)  +  (b  —  a)  fx'  (ay  +  b  (x  —  y)) . 
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Furthermore 


(11)  ft  (X)  =  g' (x),  O^x^Xt 

=  k'(x  —  yt)  +  bfi  ( ayt  +  b  (x  —  yt)),  x  ^  xt , 

and  /,'  (x)  is  continuous  at  x  —  xt. 

Comparing  (10)  with  the  equation  g'  (y)  =  K  (x  —  y)  defining  y„  we 
see  that  y,  (x)  <  yx  (x).  In  order  to  carry  out  the  induction  and  obtain 
the  corresponding  results  for  all  members  of  the  sequence  {/*},  defined 
recurrently  by  the  relation 

fn  +  »  =  Max  [g  (y)  +  h  (x  —  y)  -f  /»  («y  +  b  (x  —  y))] , 

0  <  y  <  * 

we  require  the  essential  inequality  /*'  (x)  (x).  There  arc  three 

intervals  [0,  xt],  [x„  xj,  [x„  oo],  to  examine,  each  one  requiring  a 
separate  argument.  Using  (10)  and  (11)  we  have 

(is)  /,•  W  _  **: 


for  x  ;>  x,.  Combining  (6)  and  the  equation  for  y,  we  have 

(13)  /:  (x)  -  <*--*>. . 

The  function  [6g'  (y)  —  ah'  (x  —  y)  ]/(&  —  a)  is  monotone  decreasing 
in  y  for  0  <,  y  <,  x.  Since  y*  <  y,  we  see  that  /,'  (x)  >  //  (x).  This  com¬ 
pletes  the  proof  for  the  interval  [x,,  oo].  The  interval  [0,  x,]  yields 
equality.  The  remaining  interval  is  [x,,  x,].  Jn  this  interval,  we  have 

(H)  /,'  (x)  =  g'  (x) 

a  W  (y*)  —  (*  —  y») 

Aw--  “• 


Hence  in  this  interval,  since  0  <,  yt  <;  x. 


(15) 


A'  (*);> 


6g'  (x)  —  (0) 

6  —  a 


>  s’  (*) . 


since  g'  (x)  ;>  A'  (0)  is  a  consequence  of  g'  (y)  ^  h’  (x  —  y)  for  0  <;  y  <;  x 
and  0  x  <;  Xj.  This  completes  the  proof  that  /,' (x)  St/i'(x). 

We  now  have  all  the  ingredients  of  an  inductive  proof  which  shows 
that 

(16)  a.  -x,  >  x,  >  ...  x„>  ...  >0 

b.  fi  w<; 

c-  yt  (x)  >  y,  (x)  >  . . . 
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Since/,  (x)  converges  to /(*),/„'  (x)  to/'  (x),  y,  (x)  to  y  (x)  and  x,  to  x, 
we  see  that  the  solution  has  the  indicated  form. 

§  15.  An  “ornery”  example 

Having  imposed  successively  the  conditions  that  g  and  A  be  both 
convex  or  both  concave,  let  us  now  show~by  means  of  an  example 
that  the  solution  can  be  exceedingly  complicated  if  we  allow  more 
general  functions  possessing  points  of  inflection. 

Let  us  consider  the  equation 

(!)  /(*)  =  Max  ['->•/»  +  +  /(.8y  +  .9(x  —  y))] . 

o<»<« 

The  function  e~*l*  is  used  since  it  is  one  of  the  simplest  possessing  a 
point  of  inflection.  Determining /(x)  by  means  of  the  method  of  suc¬ 
cessive  approximations,  we  obtain  a  well-behaved  curve 


Figure  1 

Note,  however,  the  strange  behavior  of  y  (x) ! 


Figure  2 
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As  soon  as  we  allow  changes  of  sign  on  the  part  of  g ’  (x)  and  h"  ( x ), 
we  seem  to  encounter  functional  equations  which  defy  precise  analysis. 

§  16.  A  particular  example — I 

Figures  1  and  2  show  the  difficulties  that  can  be  encountered  in  the 
pursuit  of  general  solutions.  Let  us  then  consider  some  simpler  equations 
which  can  be  used  for  approximation  purposes. 


Theorem  7.  The  continuous  solution  of 

(1)  /(*)  =  Max  [cx*'+f(ax),  ex»  +/(&*)],/( 0)  =  0, 


subject  to 

(2) 

is  given  by 

(3) 

where 

(4) 


a.  0  <  a,  b  <  1 ;  c,  d,  e,  g  >  0, 

b.  0  <  d  ^  g , 


/(*)  = 

/(*)  =  ex’  +/{bx),  x^x, 


e/(l  — «*)]»/<»-«'> 
ef(  1  —  b*)_ 


Since  0  <  b  <  1,  /  (x)  may  be  found  explicitly  in  the  intervals 
[x,  x/b  ],  . . .  [xjbn,  xjbn  +1]  ....  for  n  =  0,  1,  2,  ... 

Proof.  Let  us  represent  by  A  by  operation  of  choosing  cxd+f(ax), 
and  by  B  the  operation  of  choosing  ex*  -f  /  (bx).  A  solution  corresponding 
to  an  optimal  sequence  of  choices,  S  may  then  be  represented  sym¬ 
bolically  by 

(5)  S  =  Aa i  Bbi  Aat  Bb i  .... 

where  and  bt  are  positive  integers  or  zero,  and  Aa‘  means  the  choice 
A  repeated  a<  times,  with  Bbt  having  a  similar  meaning. 

Let  us  assume  for  the  moment  that  the  solution  does  have  the  indicated 
form  and  show  how  to  calculate  x.  At  the  point  x  either  an  A  or  B 
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decision  is  optimal,  while  below  x  only  an  A  decision  is  optimal.  Conse¬ 
quently,  symbolically,  x  is  the  point  where 

(6)  BA  °°  =  A°° . 


To  compute  A 00  we  write 

(7)  /{*)  =  cx*  +  f(ax)  =  cxd  +  c  {ax)*  +  c  {a*x)d  +  ... 

=  cx*l(l  —  a*) . 

Similarly  BA°°  yields 


(8)  /  (x)  —  ex*  -f-  cbdxd/{  1  —  a* ) . 


Equating  the  two  expressions,  we  find  that  x  has  the  stated  value. 

It  remains  to  prove  that  the  solution  has  the  desired  form.  Let  us 
begin  by  showing  that  A  is  always  used  when  x  is  small.  To  do  this  it 
is  sufficient  to  show  that  f  (x)  =  cxd/{  1  — ad)  is  a  solution  for  small  x, 
and  then  to  invoke  the  uniqueness  theorem.  •  We  must  assure  ourselves 
that  **— ■ 


(9) 


T  cxd  cbd  xd  1 

-  Max  •  “*  + 


for  small  x.  This,  however,  is  clear  if  g  >  d  >  0  and  0  <  b  <  1. 

We  now  proced  inductively.  Let  z  be  the  smallest  value  of  x  for  which 
a  B-choice  is  optimal.  At  this  point  BA°°  —  A°°.  This  means  that  z  —  x. 
Let  us  now  consider  the  interval  x  >  x,  and  begin  by  asking  for  the 
point  p  where  AB  and  BA  are  equally  effective  as  a  set  of  first  two 
choices. 

We  have,  using  an  obvious  notation, 

(10)  f  a  b  (*)  =  cxd  +  eaix*  -f  /  {abx) 
f BA  {x)  =  €Xd  +  cbdxd  +  f{abx) . 

Hence  the  required  point  p  is  given  by 

(11)  p  =[c(l-bd)lc(l-a')]''<'-d)- 
Since  g  >  d,  we  see  that  p  <  x. 

It  follows  then  from  the  fact  that  / ab(x )  </ba(x)  for  x  >  p  that 
for  x  >  x,  AB  plus  an  optimal  continuation  is  inferior  to  BA  plus  an 
optimal  continuation.  From  this  we  s°e  that  A  cannot  be  used  for  x  >  x 

6  Strictly  speaking,  we  haven’t  established  this  uniqueness  theorem  yet.  However, 
it  is  easy  to  see  that  the  method  used  to  establish  Theorem  1  works  equally  well 
in  this  case. 
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unless  followed  by  A  °°,  which  we  know  is  also  impossible.  This  completes 
the  proof. 

1 17.  A  particular  example — II 

Another  interesting  case  is  that  where  g  and  h  are  quadratic  in  x. 
We  leave  as  an  exercise  the  following  result: 

Theorem  8.  Let  c,  d  >  0  and  0  <  ft  <,  a  <  1.  Let 

(1)  /(*)  =  Max  [cy  —  y*  +  d(x  —  y)  —  (x—y)*  +  /(ay  +  b  (x— y))] . 
o  <  v  <,* 

/( 0)  =  0. 

Then,  in  the  interval  •  0  <,  x  <;  Min  (c/2,  d/2),f(x)  has  the  following  form, 
which  depends  on  the  sign  of  c/(l  —  a)  —  dj(  1  —  b) : 

Case  I:  c/(  1  —  a)  —  d/(l  —  b). 


(2)  f(x)  =  ,-{c  x 


d *  +  (1  —  a)1 


where 

(3) 


1  —  b  -f  (b  —  a)  a  1  —  [  (a  —  b)  a  +  ft]1 

-K(!3)*i^i3r 

Case  II:  c/(  1  —  a)  <  d/(  1  —  b). 

(4)  -(•=*)*’■ 


for  0  x  ^  Min  (A,  c/2,  d/2),  wAcrc 

_  (1  +  ft)[d(l  —  a)  —  c (1—6)] 

(  }  *  2  (1  —  ab) 

When  A  <  Min  (c/2,  d/2)  wsc  o/  (1)  as  a  recursion  formula  enables  one  to 
obtain  f  (x)  over  the  entire  interval  of  interest. 


Case  ill :  c/(  1  —  a)  >  d/(  1  —  6). 

<6> 

/or  0  <;  x  <;  Min  (/a,  c/2,  d/2)  icftcrc 

(7)  ^  =  (1  +  a)  [c  (a  —  b)  —  d  (1  —  a)  /  2  (1  —  aft). 

•  This  is  the  maximum  interval  over  which  the  g  and  h  functions  are  both 
increasing. 
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&  18.  Approximation  and  stability 

It  is,  of  course,  interesting  to  have  the  explicit  solutions  of  as  many 
equations  as  possible  available.  However,  the  true  importance  of  the 
explicit  solutions  of  simple  equations  lies  in  the  use  of  these  solutions 
as  approximate  solutions  to  more  obdurate  equations,  and  in  furnishing 
clues  to  the  nature  of  optimal  policies  for  more  complicated  processes. 

In  the  above  sections  we  have  derived  explicit  solutions  for  the  case 
where  g  and  h  have  monomial  forms  cxd,  and  for  the  case  where  they 
are  quadratic.  Note  that  approximation  to  g  { x )  by  means  of  cxd  is 
equivalent  to  an  approximation  to  log  g(ex)  by  means  of  log  c  -f  dx, 
a  straight  line,  which  is  readily  accomplished. 

Observe  that  as  x  changes,  we  may  change  our  approximating  curves 
so  as  to  obtain  better  fits  if  we  wish  closer  approximations.  Furthermore, 
let  us  point  out  that  in  general  the  approximation  is  most  useful  as  an 
approximation  in  policy  space  rather  than  in  function  space. 

In  order  to  use  approximation  techniques,  we  require  an  estimate 
for  the  difference  between  the  solutions  7  of  the  two  equations 

(1)  f{x)  =  Max  [u{x,y)  +  f{ay  +  b{x  —  y))},  f{ 0)  =  0, 

0  <  y  <;  x 

F(x)=  Max  [v{x,y)  +  F  {ay  +  b{x  —  y))],  F(  0)  =  0. 

0  <  y  <  z 

in  terms  of  the  difference  between  u  (x,  y)  and  v  {x,  y).  This  is  a  stability 
theorem  in  the  classical  sense. 

Let  us  prove 

Theorem  9.  Let  f{x)  and  F  {x)  be  the  continuous  solutions  of  the  above 
equations  under  the  assumptions  that  u  {x,  y)  and  v  (x,  y)  are  continuous 

oo 

in  x  and  y  for  all  x,  y  2>  0,  with  0  <  a,  b  <  1,  and  that  E  m  {cnz)  <  oo 

n  —  0 

where  m  ( z )  =  Max  [Max  Max  {\u(x9y)  \,\v  ( xt  y)  |  }]. 

0  <  x  <  z  0  <  |f  <  x 

U 

(2)  Max  {Max  |  u  {x,  y)  —  v  {x,  y)  |  }  =  D  (r), 

0  <  z  0  <  y  <  x 

oo 

and  Z  D  (cnz)  <  oo,  c  =  Max  (a,  b),  then 

n  ■»  0 

(3)  \/{x)-F{x)  |<£  f  D{c”x). 

n  —  0 

7  The  existence  and  uniqueness  of  these  solutions  is  assured  by  the  natural 
modification  of  the  proof  of  Theorem  1.  When  we  speak  of  the  solution,  we  shall 
mean  the  continuous  solution,  or,  generally,  the  solution  furnished  by  the  existence 
theorem. 
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Proof.  Define 

(4)  /,  (x)  =  Max  u  { x ,  y) 

o  <  y  ^  * 

/v+  i  (*)  =  Max  [m  (x,  y)  -f  fs  {ay  -f  b  {x  —  y))] 

0  <  y  <  X 

Fx  (x)  =  Max  v  (x,  y) 

0  <  ¥  <  * 

Fjv  +  1  (x)  =  Max  [t>  (x,  y)  +  FN  {ay  -f  b  (x  —  y))] . 

0  <  V  <  z 

We  know,  using  the  methods  given  previously  that  fs  (x)  converges  to 
/(x),  and  Fs  (x)  converges  to  F  (x)  as  N  -*■  oo. 

Let  us  estimate  the  difference  between  fx  and  Fx,  Clearly, 

(5)  l/i  (*)  —  -Fi  (x)  [  ^  Max  |  u  (x,  y)  —  v  (x,  y)\<,D{x). 

0  <  V  <  * 

Proceeding,  as  in  §  7,  we  have 

(G)  |/v  +  i(x)—  Fjv+  i(x)  Max  \fs  {ay  +  b  {x —  y)) 

0  <  V  <  x 

—  Fjv  (ay  -f  ft  (x  —  y))  |  -f  Max  I  m  (x,  y)  —  v  (x,  y)  | 

0  <  y  <  x 

It  now  follows  inductively  that 

(7)  |/n  +  l  (x)  —  Fs  + 1  (x)  |  <S  £  D  (c"x) . 

n  -  0 

Letting  AT  ->  oo,  we  obtain  (3). 

§  19.  Time-dependent  processes 

We  have  tacitly  assumed  in  the  foregoing  pages  that  the  processes 
under  consideration  were  time-independent  in  that  the  total  return 
depended  only  upon  the  initial  quantity  x  and  the  duration  of  the 
process  N,  and  not  upon  the  time  at  which  the  process  were  initiated. 
Let  us  now  see  how  we  can  handle  situations  in  which  this  is  not  the  case. 

Let  us  assume  that  as  a  result  of  the  division  of  x  into  y  and  (x  —  y) 
at  the  kth  stage,  we  receive  a  ’•eturn  g*  (x,y)  and  are  left  with  a  quantity 
( x ,  y).  It  is  required  to  determine  the  allocation  policy  which  maximizes 
the  total  A’-stage  return. 

We  shall  assume  that  g*  (x,  y)  is  continuous  in  x  and  y  for  x  0 
and  0  <;  y  <  x  and  that  (x,  y)  is  likewise  continuous  in  this  region 
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and  satisfies  the  inequality  0  <;  a*  ( x ,  y)  <;  ax,  a  <  1,  for  k  =  1,  2,  . . . 
Define 

(1)  fie,  n  (x)  =  total  JV-stage  return  obtained  starting  with  a  quantity  x 

at  stage  k  and  employing  an  optimal  policy. 

We  have 

(2)  A,  ,  (*)  =  Max  gt  ( x ,  y) , 

o  <  y  <  * 

and  for  N  ;>  2,  arguing  as  in  the  preceding  pages, 

(3)  A.  n  (*)_=  Max  (gk  ( x ,  y)  +fk  +  \,  n-\  (a*  (*.  y))) . 

0  <  y  <  x 

Since  the  double  subscript  is  distressing  both  analytically,  esthetically, 
and  above  all,  computationally,  let  us  see  whether  or  not  we  can  restore 
the  single  subscript  relation.  Having  made  up  our  mind  that  we  are 
interested  in  an  JV-stage  process  starting  at  stage  1,  let  us  define 

(4)  A  (•*)  =  total  return  obtained  starting  with  a  quantity  x  at  stage  k 

and  ending  at  stage  N,  employing  an  optimal  policy, 
k  =1,2 . N 

Then 

(5)  A  (x)  =  Max  gN  (x,  y) 

0  <  y  <  X 

AW  -  Max  [ gk(x,y )  +  A  +  i  («*  W  y))].*  =  1.2 . N—  1. 

O  <v  <  z 

This  simplification  is  essential  if  v/e  are  interested  in  computational 
solutions,  since  the  difference  between  the  effort  involved  in  the  tabulation 
of  functions  of  one  variable  and  functions  of  two  variables  is  enormous, 
while  that  between  the  tabulation  of  functions  of  two  variables  and 
functions  of  three  variables  may  be  the  difference  between  a  feasible 
and  unfeasible  approach. 

The  case  of  unbounded  processes,  i.e.,  N  —  oo,  yields  the  set  of 
functional  equations 

(6)  A  W  =  Max  [g*  (x,  y)  +  A+i  («*  W  y))]  • 

i)  <  y  <  z 

It  is  not  difficult  to  obtain  the  analogues  of  Theorem  1  for  these  systems. 
§  20.  Multi-activity  processes 

The  process  we  have  been  using  for  expository  purposes  is  the  simplest 
of  its  category  since  we  allow  only  one  type  of  resource,  and  require  only 
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one  allocation  at  each  stage.  Let  us  now  discuss  the  formulation  of  more 
general  and  more  realistic  processes. 

Let  there  be  M  different  kinds  of  resources,  in  quantities  xXl  xt,  .... 
xm  respectively.  At  each  stage,  a  quantity  *«/  of  the  *,b  resource  is 
utilized  to  produce  an  additional  quantity  of  the  ;‘h  resource.  Hence 
we  have  the  equations,  relating  the  resources  at  the  ( k  -f  1)“  stage  to 
the  resources  at  the  kih  stage, 

(1)  x((k  -f-  1)  =xt(k)  —  27  Xu  (k)  +  gi  (xii  (k),  xti  ( k ),  ....  xMt  {k)) , 

i  - 1 

for  *  =  1,  2,  ....  M,  where 

(2)  (a)  xt){k)^0, 

(b)  ir  xi,{k)<xxi{k), 
i  -  i 

and  the  production  functions,  gi,  are  assumed  known,  together  with 
the  initial  quantities,  X(  (0)  =  ct. 

The  xtj  (k)  are  to  be  chosen  so  as  to  maximize  some  pre-assigned 
function 

(3)  Rh  =  F  (Xl  (N),  xt  (N) . xm{N)), 

of  the  final  resources. 

In  many  cases,  as  we  shall  see  in  Chapter  6,  there  are  other  constraints 
in  addition  to  those  of  (2). 


If  we  set 

(4)  ftr  (c„  c„,  . . . ,  cm)  =  Max  Ry , 

{-./} 

we  obtain,  as  before,  the  recurrence  relations 

M 

(5)  /.v(c,,c, . cm)  —  Max />-  -  i  (cj  —  27  ytl  +  gi{yluyti,  •  •  -  ,ymi),. . .) 

U >ii)  >  -  1 

for  Ar  ^  2,  where  the  ytj  are  restricted  by  the  relations 

(6)  (a)  yu  ^  0 


it 

(b)  27  yu  <^d,  i  —  1,2,...,  M, 

i  -  i 

and 

(7)  /,  (c„  c, . cm)  =  F  (c„  ct . cm)  ■ 
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Existence  and  uniqueness  theorems  covering  the  unbounded  versions 
of  these  general  processes  will  be  given  in  Chapter  IV,  in  conjunction 
with  a  better  notation.  We  shall  encounter  a  particular  example  of  this 
equation  further  along  in  connection  with  the  bottleneck  processes  of 
Chapter  VI.  In  the  present  chapter  we  shall  discuss  briefly  some  of  the 
difficult  computational  problems  raised  in  maximizing  over  a  multi¬ 
dimensional  domain. 

§  21.  Multi-dimensional  structure  theorems 

It  is  not  difficult  to  extend  the  results  wt  obtained  in  the  one¬ 
dimensional  case  concerning  convexity  and  concavity  of  the  solutions 
of  the  functional  equation  of  (8.1)  to  the  multi-dimensional  equations 
of  §  20. 

Let  G  (x)  be  a  scalar  function  of  a  vector  variable  x.  It  is  said  to 
be  convex  if 

(1)  G(A*  +  (1—  X)y)<ZXG(x)  +  (1  —  X)G(y) 

for  all  A  in  the  range  0  <;  A  <;  1.  The  function  is  concave  if  the  inequality 
goes  the  other  way. 

The  multi-dimensional  analogue  of  Lemma  1,  proved  in  §  13,  is  valid 
and  the  proof  is  precisely  the  same.  Using  the  lemma,  we  can  establish 
the  result  below. 

Before  stating  the  result,  let  us  introduce  a  more  convenient  notation. 
Let  x  denote  the  vector  whose  components  are  x<,  and  y<'>  denote  the 
vector  whose  components  are  ytj,  for  1  <;  i,  j  <.  M.  Then,  in  terms  of 
the  process  described  above,  we  have 

(2)  (a)  x  —  H  y(,) , 

ft 

(b)  y«>  ^0, 

where  the  notation  y  ;>  0  signifies  that  all  components  of  y  are  non¬ 
negative.  Let  D  { x ,  y)  denote  the  domain  defined  by  (2). 

Theorem  10.  If  r  (x,  y)  and  a  {x,  y)  are  continuous  concave  functions  of 
x  and  y  for  all  x,  y  0,  and  r  (x,  y),  a  (x,  y)  are  monotone  increasing  in 
the  components  of  x,  then  the  functions  {/v  (x)}  defined  by  the  equations 

(2)  /,  (x)  =  Max  r  (x,  y) , 

i>  (x,  w 

fs  + 1  (x)  =  Max  [r  (*,  y)  \  fN  ( a  (*,  y))] 
i>  <■>■,  y) 

are  all  concave  functions  of  x  for  x  ;>  0. 
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This  implies  a  unique  optimal  policy  for  each  N,  if  r  (x,  y)  is  strictly 
concave. 

The  importance  of  this  result  resides  in  the  following.  If  we  have 
an  2V-stage  process  where  k  decisions  must  be  made  at  each  stage,  the 
functional  equation  approach  reduces  the  AfA-dimensional  maximization 
problem  to  a  set  of  N  A-dimensional  problems.  Although  this  is  an 
essential  reduction,  the  ^-dimensional  maximization  problems  them¬ 
selves  possess  thorny  features. 

If,  however,  the  function  of  k  variables  we  are  maximizing  is  strictly 
concave,  we  know  that  it  possesses  a  unique  relative  maximum  which 
is  the  absolute  maximum.  Given  this  additional  information  that  the 
function  under  investigation  has  a  unique  relative  maximum,  we  should 
be  able  to  determine  a  search  procedure  for  the  location  of  this  maximum 
which  is  far  more  efficient  than  the  search  procedure  we  would  employ 
for  a  general  function. 

§  22.  Locating  the  unique  maximum  of  a  concave  function 

The  determination  of  optimal  search  procedures  *  for  the  location  of 
the  maximum  of  a  concave  function  or,  conversely,  for  the  minimum  of 
a  convex  function,  is  an  extremely  important  and  difficult  problem  which 
has  not  been  solved  to  date.  The  solution  has,  however,  been  obtained 
in  the  one-dimensional  case  for  the  more  general  situation  where  the 
function  is  unimodal,  which  is  to  say  possesses  a  single  relative  maximum. 

Let  us  pose  the  problem  in  the  following  terms.  The  function  y  —f(x) 
is  a  strictly  unimodal  function  defined  on  the  interval  [0,  L„].  We  wish 
to  determine  the  maximum  Ln  with  the  property  that  we  can  always 
locate  the  maximum  of  y  —  f  (x)  on  a  sub-interval  of  unit  length  by 
calculating  at  most  n  values  of  the  function  /(*).  Since  the  maximum 
may  not  exist,  it  is  safer  to  begin  by  setting 

(1)  Fn  =  Sup  Ln 

We  then  have  the  following  result 
Theorem  11.  Fn  is  the  n,h  Fibonacci  number  ',  i.e.,  F0  =  F,  =  1  and 

(2)  F*  —  fn-l  -f  f»-2 
for  n  ;>  2. 

Proof.  The  definition  of  F0  is  a  matter  of  convention,  on  the  other 
hand  the  value  of  F,  is  determined  l  v  the  process. 

*  It  is  actually  not  easy  to  specify  precisely  what  we  mean  by  an  optimal  search 
procedure.  It  clearly  depends  upon  the  type  of  equipment  we  have,  the  type  of 
operations  we  permit,  the  “cost”  of  these  operations,  and  so  on.  Consequently, 
there  are  a  variety  of  problems  of  the  above  type  which  may  be  posed  The  subject 
has  not  been  explored  to  any  extent. 
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Let  us  now  proceed  inductively.  Fix  n  and  calculate  the  values 
yi  =  /(* i),  yt  =  /(*»)  where  0  <  x,  <  xt  <  Ln.  If  yt  >  y„  the  maxi¬ 
mum  occurs  on  (0,  xt)  since  / (x)  is  strictly  unimodal.  If  y,  >  y,(  the 
maximum  is  on  (x,,  Ln).  If  y,  =  y».  choose  either  of  the  above  intervals, 
even  though  we  know  the  maximum  occurs  on  (x„  xt).  Thus,  at  each 
stage  after  the  first  computation  we  are  left  with  a  subinterval  and  the 
value  of  /  (x)  at  some  interior  point  x.  Since  values  at  the  ends  of  an 
interval  furnish  no  information  per  se,  we  restrict  our  attention  to  the 
interior  points. 

For  n  =  2,  Ln  =  2  —  e,  x,  =  1  —  e,  xt  =  1,  for  arbitrarily  small  e  >  0. 
From  the  preceding  argument  it  follows  that  Ft  —  2  —  F,  -f  F„. 

Consider  the  case  where  n  >  2  and  assume  that  = 
for  k  =  2, _ ,  n  —  1.  Let  us  begin  by  showing  that 

(3)  Fn  ^  F„  _  i  +  F„  -  2. 

For  if  we  calculate  / (x)  at  x,  and  x,  on  (0,  Ln)  we  have 


o  X,  Xg  Ln 


Figure  3 

If  yx  >  y».  we  obtain  the  new  picture 


O  X,  x2 


Figure  4 
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In  this  case  xt  <  Fm  -  t  since  we  have  only  (n  —  2)  additional  choices 
with  xx  a  first  choice,  for  the  case  k  =  n  —  1.  Moreover,  *,  <Fn-  i. 
since  the  maximum  could  occur  on  [0,  xj,  with  two  choices  of  x  already 
used. 

Similarly  if  y,  >  y,,  we  have  Ln  —  *»  <  Fn  -  x 

Thus  in  all  cases  L„  <  Fn  -  i  +  Fn  -  t,  which  yields  (3).  Now  chose 
Ln,  *»,  xt  arbitrarily  close  to  their  respective  upper  bounds  Fn  -  i  +  Fn  -  t. 
Fn  -  i  and  Fn-  t  respectively.  Then  Fn  =  Fn  -  ,  Fn  -  «.  This  yields 
the  proof  of  Theorem  11.  Furthermore,  it  yields  the  optimal  policy, 
since  each  x<  is  either  discarded  or  is  the  optimal  first  choice  for  the 
remaining  subinterval. 

The  sequence  {F*}  has  as  its  first  few  terms 

(4)  1.  1,  2,  3,  5,  8,  13,  21,  34,  55 . 

with  F, o  >  10,000.  Hence  the  maximum  of  a  strictly  unimodal  function 
can  always  be  located  within  10-4  of  the  original  interval  length  with 
at  most  20  calculations  of  the  value  of  the  function. 

It  is  easy  to  obtain  an  explicit  representation  for  Fn,  namely 


(5) 

(rt  —  1)  _  ,  (1  —  r,)  _ 

Fn-  ,  .  rx »  +  r," 

('t  —  r ,)  (''•  —  ''i) 

where 

(6) 

1  +  Vb 

n-  2  ~  i.6i 

l  —  VI 

2  “  -61 

From  this  we  see  that  F„  +  tIF„  ->•  r,  ~  1.61  as  n -*■  oo.  Thus,  for 
large  n,  a  uniform  approximate  procedure  is  to  choose  the  two  first 
values  at  distances  L/r,  from  either  end,  where  L  is  the  length  of  the 
interval.  This  is  a  useful  technique  for  machine  computation. 

Consider  now  the  related  problem  where  the  unimodal  function  is 
defined  only  for  discrete  values  of  x.  Let  Kn  be  the  maximum  number 
of  points  such  that  the  maximum  of  the  function  can  always  be  iden¬ 
tified  in  n  computations.  The  same  type  of  proof  as  above  establishes. 

Theorem  12.  K0  =  1,  Kx  —  1,  Kt  =  2,  A',  =  4,  and 
(7)  Kn  =  1  +  F„,  n  ^  3. 
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§  23.  Continuity  and  memory 

Let  us  suppose  that  we  have  a  function  of  two  variables,  / ( x ,  y), 
depending  continuously  on  x  and  y  for  x  ^  0  and  0  <Ly<Lx.  Define 
the  function 

<1)  g(x)  =  Max  f(x,  y) . 

0  <  V  <  x 

It  is  clear  that  g  (x)  will  be  continuous,  but  the.  function  y  =  y  (x) 
yielding  the  maximum  need  not  be  continuous.  We  have  already  seen 
an  example  of  this  in  connection  with  the  functional  equation  of  §  15. 

Suppose,  however,  that  we  restrict  /  ( x ,  y)  to  be  a  strictly  concave 
function  of  y  lor  all  y  in  [0,  x],  for  x  0. 


It  is  clear  that  as  x  varies,  the  maximizing  y  will  now  be  a  continuous 
function  of  x. 

Let  us  see  how  we  can  utilize  this  information  to  simplify  the  memory 
problem  for  computing  machines.  Consider  the  equations 

(2)  fs  +  i(x)  =  Mai.  [g{y)  +  h(x  —  y) +/N(ay  +  b{x  —  y))], 

o  <  v  <  I 

N  =  1,2 . 

If  we  have  no  information  concerning  the  location  of  a  maximizing  y, 
we  must  have  available  all  values  of  /y  (z)  for  0  <_  z  ■<,  ax  in  order  to 
determine  /jv  +  i(x).  Suppose,  however,  we  take  g  (x)  and  h  (x)  to  be 
strictly  concave  as  well  as  continuous.  In  this  case,  /y  (x)  is  strictly 
concave  for  each  N  and  the  function  g  (y)  +  h  (x  —  y)  +  fx  (ay + 
b  (x  —  y))  is  strictly  concave  for  0  y  <i  x,  and  what  is  most  important 
the  function  yy  (x)  which  yields  the  maximum  in  (2)  is  unique  and 
continuous  as  a  function  of  x. 

It  follows  than  that  if  we  are  using  an  x-grid  of  values  0,J,2A . 

to  compute /(x),  the  complete  set  of  values  of /y  (z)  for  0  <;  z  ax  is 
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not  required  to  compute  /jv+ 1  (*),  but  only  the  values  of  /v  (z)  in  a 
relatively  small  neighborhood  of  z  =  yN  (x —  A). 

The  same  idea  extended  to  multi-dimensional  equations  can  result  in 
a  considerable  saving  of  memory  space  in  computing  machines.  Recipro¬ 
cally,  we  will  be  able  to  solve  problems  using  existing  machines  which 
might  otherwise  escape  them.  In  any  case,  a  great  saving  in  running 
time  will  result,  once  again  increasing  the  feasibility  of  a  solution  by 
these  means. 

§  24.  Stochastic  allocation  processes 

In  the  preceding  pages  of  the  chapter,  we  have  considered,  in  greater 
and  lesser  detail,  various  multi-stage  allocation  processes  characterized 
by  the  property  that  the  outcome  of  any  decision  was  uniquely  determined 
by  the  choice  of  this  decision.  Processes  of  this  type  we  call  deterministic. 

Not  all  multi-stage  processes,  however,  possess  this  property,  and,  as 
a  matter  of  fact,  many  of  the  most  interesting  are  quite  definitely  not 
of  this  type.  Let  us  consider  here  one  important  class  of  non-deterministic 
processes  in  which  the  effect  of  a  decision  is  to  determine  a  distribution 
of  outcomes  in  the  sense  of  probability  theory.  Processes  of  this  type 
we  shall  call  stochastic. 

We  shall  limit  ourselves  in  this  book  to  processes  of  these  two  types. 
The  discussion  of  the  origin  of  processes  of  more  complicated  nature, 
and  their  treatment,  we  shall  defer  to  another  place. 

From  the  mathematical  point  of  view,  stochastic  processes  furnish 
varied  classes  of  fascinating  analytic  problems,  and  throw  unexpected 
light  upon  many  processes  of  supposedly  deterministic  nature.  Appli¬ 
cations  of  the  theory  are  furnished  by  scores  of  processes  drawn  from 
biologic,  economic,  engineering,  and  physical  fields. 

Returning  to  our  domain  of  decision  processes,  a  fundamental  problem 
confronting  us  is  that  of  defining  what  we  mean  by  an  optimal  policy 
in  t-he  face  of  uncertain  outcomes.  What  is  crystal  clear,  but  so  often 
overlooked  in  a  posteriori  comment,  is  the  fact  that  a  lack  of  complete 
control  over  a  process  effectively  prevents  a  guarantee  of  a  maximum 
return. 

On  the  other  hand,  despite  this  Damoclean  sword  of  uncertainty, 
there  must  exist  some  means  of  comparing  policies,  taking  into  account 
the  possible  fluctuation  of  outcomes. 

What  causes  a  major  difficulty  in  applications  is  not  that  it  is  hard 
to  find  such  a  measure,  but  rather  that  is  is  hard  to  find  a  unique  measure. 
In  short,  it  must  be  emphasized  that  there  is  no  one  method  which  can 
have  any  pretensions  to  the  title  of  “best.”  Whatever  method  is  used 
depends  to  a  large  extent  upon  various  analytic  and  arithmetic  aspects 
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of  the  process,  and,  it  must  be  confessed,  upon  the  philosophical  and 
psychological  attitudes  of  the  decision-makers. 

Having  thus  dwelt  upon  the  dismal  side  of  the  matter,  to  assuage 
our  consciences,  let  us  now  proceed  more  constructively. 

The  general  idea,  and  this  is  fairly  unanimously  accepted,  is  to  use 
some  average  of  the  possible  outcomes  as  a  measure  of  the  value  of  a 
policy.  It  is  in  the  choice  of  this  average  that  the  difficulties  arise. 

Let  us  point  out  in  passing  that  there  is  a  definite  lack  of  unanimity 
concerning  the  use  of  averages  in  determining  policies  for  stochastic 
processes  which  may  be  carried  through  once,  or  at  best,  only  a  few 
times.  In  some  cases,  “distribution-free”  policies  can  be  obtained.  In 
general,  however,  there  seems  to  be  no  other  approach  to  these  questions 
than  the  usual  one  we  present  here. 

The  first  average,  or  criterion,  we  shall  employ  is  the  common  arithme¬ 
tic  weighted  average,  or  expected  value.  Due  to  the  linearity  of  this 
ave-a  e,  it  possesses  a  most  important  invariant  property  which  greatly 
simplifies  the  functional  equations  which  describe  the  process.  This 
property  enables  the  future  decisions  to  be  based  solely  upon  the  present 
state  of  the  system,  independently  of  the  past  history  of  the  process. 

The  second  criterion,  which  is  far  Jess  frequently  used,  is  the  probability 
of  achieving  at  least  a  certain  level  of  return.  This  also  possesses  the 
proper  invariant  structure  as  far  as  multi-stage  processes  are  concerned. 
We  will  discuss  this  criterion  in  greater  detail  in  a  subsequent  chapter. 

§  25.  Functional  equations 

Let  us  now  consider  a  simple  stochastic  version  of  the  deterministic 
process  considered  in  §  2,  and  show  that  the  same  functional  equation 
technique  is  applicable. 

In  place  of  assuming  that  the  outcome  of  a  division  of  x  into  y  and 
x  —  y  is  a  return  of  g  (y)  h  (x  —  y),  leaving  a  new  quantity  =  ay 
-+-  b  (x  —  y),  let  us  assume  that  with  probability  px  there  is  a  return 
of  gi  (y)  -f ■  h1(x  —  y)  and  a  remaining  quantity  a,  y  -f  5,  (x  —  y),  and 
with  probability  px  —  1  —  />,  a  return  of  gt  (y)  +  hx  (x  —  y)  and  a  new 
quantity  at  y  +  6,  (x  —  y) 

Let  us  define 

(1)  fN  (x)  =  the  ex-bected  total  return  of  an  A'-stage  process,  obtained 

using  an  optimal  policy,  starting  with  an  initial  quantity  x. 

Then,  as  before,  we  obtain  the  equations 

(2)  /,  (x)  Max  [pi  t?,  (y)  +  hl  (x  —  v))  +  pt  (gt  (y)  +  hx  (x  —  y))]  , 

0  £  y  f  r 
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fx  +iW=  Max  [px  [g,  (y)  +  A,  (x  — y)  -f  fs  (a,  y  -f  6,  (x  —  y))]  -f 

U  <  V  <  x 

Px  [gx  (y)  +  a,  (x  —  y)  + /v  (a,  y  +  6,  <x  —  y))]] , 

for  N  ^  1. 

The  equations  have  the  same  analytic  structure  as  those  obtained 
from  the  deterministic  process.  By  agreeing  to  use  the  "expected  value” 
as  the  measure  of  the  value  of  a  policy,  we  have  eliminated  the  stochastic 
aspects  of  the  process,  at  least  as  far  as  the  analysis  is  concerned. 

§  26.  Stieltjes  integrals 

For  those  who  are  familiar  with  the  Riemann-Stieltjes  integral,  there 
is  a  much  more  compact  way  of  writing  the  above  equations.  Let 

(1)  dG  ( u ,  v\  x,  y)  —  distribution  function  of  a  return  of  u  and  a  re¬ 

maining  quantity  of  v,  starting  with  an  initial 
quantity  x  and  making  an  allocation  of  y. 

Taking  fs  (x)  to  be  defined  as  above,  we  obtain  the  equations 

(2)  /,(*)=  Max  f  udG  (u,  v  \  x,  y) , 

/n  + 1  (x)  =  Max  I*  [u  +  fs  (v)]  dG  («,  v ;  x,  y) 

O  <  y  <  x  J 

It  is  much  simpler  to  describe  the  processes,  to  establish  existence 
and  uniqueness  theorems  for  the  resultant  functional  equations,  and  to 
derive  analytic  properties  of  the  solution,  using  this  short-hand  notation. 
The  basic  mathematical  ideas  are,  however,  the  same. 

Equations  of  this  type  will  be  discussed  again  in  Chapter  III  within 
a  more  general  framework. 


Exercises  and  Research  Problems  for  Chapter  I 

1.  Let  us  define  the  function 

fs  (n)  =  Max  [x,  x*  . . .  x.v] 

it 

where  R  is  the  region  determined  by  the  conditions 

a.  *i  +  x,  +  . . .  +  x.v  —  a,  a  >  0. 

b.  x(  ;>  0. 
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Prove  that  fs  («)  satisfies  the  recurrence  relation 

/*(«)  =  Max  xfN-i(a —  x),  N  ^  2  , 

o  <  x  <• 

with  /,  (a)  =  a. 

2.  Show  inductively  that  fs  (a)  =  aN/NN,  and  hence  establish  the 
arithmetic-geometric  mean  inequality, 

/*i  +  *«  +  •  •  -  +  xn\n 

I - — - 1 

for  xt  ^  0,  with  equality  only  if  =  x,  =  . .  .  =  xs. 

3.  Let  us  define  the  function 

.v 

/y  (a)  =  Min  Z  xi»,  p  >  0, 

R  t  -  1 

where  R  is  the  region  defined  by 

x 

a,  Z  xt  ^  a,  a  >  0. 

«  -  i 

b.  xt  2>  0. 

Show  that  fN  (a)  satisfies  the  recurrence  relation 

/A  (a)=  Min  [x*  +  /at _  x  (a  —  x)],  N^2, 

0<;£a 

with  /,  (a)  = 

4.  Show  that  fs  ( a )  —  a p  Cs,  where  cy  depends  only  upon  N  and  p, 
and  thus  that 

cy  =  Min  [xp  4-  (1  —  x)  pcs  _  i] . 

o  <  x  <  l 

Determine  cy  for  the  ranges  0<^/><l,l  <  p,  respectively. 

5.  Consider  the  problem  of  minimizing  the  function 

s 

F  (xlt  x2 . xN)  =  Z  pt  sf/(si  -f  x,), 

i  -•  1 

where  the  pt  and  st  are  parameters  subject  to  the  conditions  pi 

Z  pi  =  1,  St  >  0,  and  the  xt  range  over  the  region  defined  by  xi 
i 

X 

Z  xt  =  a. 

i  i 
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Obtain  the  corresponding  recurrence  relations  and  show  that  the 
solution  is  of  the  form 

*>  =  0,  0 <;/<:*. 

X)>  0,  t  + 

under  a  suitable  reordering  of  the  x/s. 


6.  Consider  the  problem  of  maximizing  the  function 

x  — 

F  (x„  x . .  xN)  =  E  <p  (xt) , 

i  -  1 

X 

subject  to  the  constraints  x\  ;>  0,  E  xi  =  c.  Show  that  the  maximum 

i  —  i 

is  9 o  (c),  under  the  assumption  that  <p  (x)  is  convex. 

7.  Consider  the  case  where  <p  (x)  is  a  monotonically  increasing  function 
which  is  strictly  concave.  Show  that  the  solution  of  the  corresponding 
functional  equation, 

fx  (c)  =  Max  [<p{y)  +  fN  -\{c —  y)],  N  ^  2 , 

0  <  y  <  e 

fx  (C)  =  9  (c)  , 

has  the  form 

yN  =  0,  0<;  c  <,  cN, 


—  Zx,  c  >  Cx, 


where  Zx  is  the  unique  solution  of 

<P'  (y)  =  fx  -  i  (c  —  y) , 

for  N  ^  2,  and  show  how  to  determine  the  sequence  {cjv}. 

8.  Obtain  explicit  recurrence  relations,  and  the  analytic  form  of  the 
sequence  for  the  case  where 

9?  (y)  —  y  —  by *,  b  >  0 , 
and  c  is  restricted  to  the  range  0  c  <;  1/2  6. 

9.  What  are  the  analogues  of  these  result  for  the  case  where  the  function 

.v 

r  has  the  form  E  <pt  (x(),  where  each  function  <pt  ( x )  satisfies  the  same 

»  -  t 

conditions  as  above? 


10.  Carry  through  the  corresponding  analysis  for  the  problem  of  mini- 

x  x 

mizing  F  (x„  xlt  ....  x*)  =  E  (x(),  subject  to  Xj  ;>  0,  E  x<  =  a  in  the 


i  -  I 
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case  where  <p  (x)  is  a  non-negative  monotonically  increasing  function 
which  is  strictly  convex.  Consider,  in  particular,  the  case  where 

tp  (*)  =  x  -f-  bx*.  b  >  0. 

11.  Consider  the  problem  of  maximizing 


subject  to 


x 

F  (*„  xtl  .  ...xx:  ylf  yt,.  ...yN)=  X  <p  ( xt ,  yt) , 

i  -  1 

X  X 

Xt.  yt  ^  0,  X  Xt  =  ct,  X  yt  =  c„ 

i-l  i  -  1 


where  <p  (x,  y)  is  a  strictly  concave  function,  monotone  increasing  in  x  and 

y- 

Show  that  the  corresponding  functional  equation 

fx  (c„  ct)  =  Max  [95  ( x,y )  +fx-i  (c,— x,  c, — y)] , 

0  <  *  <  r, 

O  <  V  <  't 

possesses  for  each  N  ;>  2  a  solution  of  the  form 


and  show  how  to  determine  the  boundary  curves. 

Consider,  in  particular,  the  case  where 

(p  (x,y)  =  »i  x  +  r,  y  +  »,  a:!  +  2 w3  xy  +  u,  y*, 

12.  Under  the  assumption  that  tp  ( x )  is  a  monotonically  increasing  strictly 

concave  function,  determine  the  maximum  of  F  (x,,  x, . Xs)  — 

x 

X  cp  (xt)  over  the  region  determined  by 
f  «*  1 
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X 

a.  E  xt<.  ci,  Xt  ;>  0 
*  -  1 

X 

b.  E  xip  ^  ct, 

i  -  1 

for  p  >  1  and  p  <  1  respectively. 

13.  Obtain  the  recurrence  relations  arising  from  the  problem  of  mini- 

X 

mizing  E  <pt  (x»)  subject  to  the  restrictions 

«  -  i 

a.  0  <;  x<  <;  ri , 
x 

b.  E  y>t  ( xt )  ^  a , 

i  -  1 

under  the  assumptions  that  each  yu  (x)  is  a  non-negative  monotone  in- 

x 

creasing  function  of  x,  with  E  (r<)  ;>  a. 

i  -  1 

14.  Consider  the  corresponding  multi-dimensional  problem  of  mini- 

x 

mizing  E  <pi  (x<,  y<)  subject  to  the  constraints 

t  —  1 

a.  0  <:  Xi  ^  ri,  0  ^  yt  <,  s( , 
y 

b.  E  y>t  (x,p  yt)  ^  a , 

i  -  1 

under  appropriate  assumptions  concerning  the  sequence  {yu}. 

15.  Determine  the  maximum  of  the  function  x,  x,  . .  .  x.v  over  the  region 
defined  by 

.v 

a.  E  xi  —  1,  Xi  ;>  0, 

«  -  i 

b.  bxk  ^  xt  4 1,  b  >  1,  k  =  1,  2,  . .  .,  N — 1. 

x 

Consider  the  same  problem  for  the  function  E  x^,  for  different  ranges 

i  -  i 

of  p. 

16.  Consider  the  recurrence  relations 

fi  M  =  Max  [g  (y)  +  h  (x  —  y)] , 

0  <  y  <  x 

fx  +■ 1  (x)  =  Max  [g  (y)  4-  h  (x  —  y)  +  fs  (ay  +  b  (x  —  >-))], 

t)  <  V  <  -r 
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where  g  (y)  =  c{yd,  h(y)  =  c<y*.  with  c„  ct,  d  >  0.  Show  that  fs  (x)  = 
unx*,  where 


m,  =  Max  [CxVd  +  ct  (1  —  r)-] , 

0<t<l 

uN  + 1  =  Max  \ctvd  -f-  c,  (1  —  r)-  +  un  (av  -f  4(1  —  v))*] . 

O  <  r  <  I 

Show  that 


lim  u\  =  Max 

A'  — ♦  oo  0  <  V  <  1 


C 1 1"*  +  C,  (1  —  r)<*  1 

.1  —  (av  +  b  (1  — v))d  J 


17.  Consider  the  process  described  in  §  2  under  the  assumption  that  it  is 
not  required  to  use  all  the  resources  available  at  each  stage.  Show  that  the 
functional  equation  obtained  in  this  way  has  the  form 


/  (x)  =  Max  [g  (y,)  4-  h  (y.)  4-  /  (ay,  4-  byt  4-  x  —  y,  —  y,)] . 

V.  +  Vi  <  Jf 


Does  this  equation  have  a  solution  if  g  (x)  and  h  (x)  are  both  concave 
functions  of  x?  Does  it  have  a  solution  if  they  are  both  convex?  Under 
what  conditions  upon  g  (x)  and  h  (x)  does  it  have  a  solution  with  a  corre¬ 
sponding  optimal  policy? 


18.  Show  that  if  there  is  a  solution  with  yt  4-  y*  <  x,  y,,  y,  >  0,  then 
g'  (yi)/(l  —  a)  —  h'  (y,)/(l  — b )  under  suitable  assumptions  concerning 
g  and  h.  What  is  the  interpretation  of  this  solution  ? 


19.  Consider  the  process  described  in  §  2  under  the  assumption  that  addi¬ 
tional  resources  are  added  at  each  stage,  either  externally  or  from  the 
conversion  of  all  or  part  of  the  return  g  (y)  +  k  (x  —  y)  into  resources, 
and  obtain  the  corresponding  recurrence  relations. 

20.  Consider  the  process  described  in  §  2.  Define  gN  ( z )  as  the  minimum 
cost  required  to  obtain  a  total  return,  of  z  at  the  end  of  N  stages.  Show  that 

g i  (*)  =  M'n  [(1  —  «)  yi  4-  (1  —  f>)  yt] , 

V  (Vi)  +  *  <¥,)  “  * 

Vi.  >  « 

gN  + 1  (z)  =  Min  [(1  —  a)  y,  +  (\  —  b)  y2  +  gN  (z  —  g  (y,)  —  h  (yt))] 

Vi.  Vt  >  0 

21.  There  are  N  different  types  of  items,  with  the  tth  item  having  weight 
7e<  and  a  value  vi.  It  is  desired  to  load  a  ship  having  a  total  capacity  of  w 
pounds  with  a  cargo  of  greatest  possible  value.  Show  that  this  problem 

leads  to  the  problem  of  determining  the  maximum  over  the  of  the 

.v 

linear  form  L  =  2.’  vt,  subject  to  the  constraints,  tit  —  0,  1 ,  2,  . . . ,  N, 

i  -  I 
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JL'  fti  W(  <;  u\  and  thus  that  this  problem  leads  to  the  recurrence  relations 
<  -  1 

A  («0  =  Pi  [w'/if'i],  {[<*]  denotes  the  greatest  integer  contained  in  a) 

fs  + 1  (u>)  =  Max  [xvN  +  1  +/n(u>  —  xic.v  + 1)] , 

0£.S[~ 

I  .V  +  1 

where  x  can  assume  only  zero  or  integral  values. 

22.  Suppose  that  we  have  a  herd  of  cattle  and  the  prerogative,  at  the 
end  of  the  year,  of  sending  one  part  of  the  herd  to  market,  and  retaining 
the  other  part  for  breeding  purposes.  Assume  that  the  dollar  value  of  y 
cattle  sent  to  market  is  <p  ( y ),  and  that  z  retained  for  breeding  purposes 
yield  az,  a  1 ,  at  the  beginning  of  the  next  year. 

Show  that  the  problem  of  determining  a  breeding  policy  which  maxi¬ 
mizes  the  total  return  over  an  N- year  period  leads  to  the  recurrence 
relation 

/.  (*)  ~  Max  <p  (y) 

o  <  y  <  t 

fN  (x)  =  Max  [q>  (y)  +  fN-  i  (a  (x  —  y))] . 

o  <  If  <  1 

23.  Determine  the  structure  of  the  optimal  policies  in  the  following  cases: 

a-  9s  (>’)  =  *y,  k  >  0 

b.  q>  (y)  is  quadratic  in  y 

c.  rp  (y)  is  strictly  convex 

d.  q>  (y)  is  strictly  concave 

24.  Formulate  the  equations  under  the  additional  restriction  that  cattle 
must  be  2  years  old  before  they  can  be  sold.  Take  into  account  feeding 
cost  and  mortality  rates. 

25.  Consider  the  case  in  which  there  are  probability  distributions  for  the 
price  and  demand. 

26.  In  problem  22,  let  (p  (x)  —  cxa,  c,  d  >  0.  Show  that  f.x  (x)  =  cyxd, 
where  cl  —  c  and  c.v  +  i  =  Max  [rd  -f-  c.x  ad  (1  —  r)d],  N  =  1,2,  .... 

o  <  r  <  1 

Determine  the  asymptotic  behavior  of  c.v +i /c.v  and  r.v  n/r.v. 

27.  Suppose  that  we  have  a  (juantity  x  of  money,  and  that  portions  of 
this  money  can  be  used  for  common  goods,  invested  in  bonds,  or  invested 
in  stocks.  The  return  from  y  dollars  invested  in  bonds  is  ay  dollars,  a  >  1, 
over  a  period  of  one  year;  the  return  from  z  dollars  in\ested  in  stocks  is 
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bz  dollars,  b  >  1 ,  over  a  period  of  one  year.  The  utility  of  tv  dollars  spent 
is  (f  (a  ).  How  should  the  capital  be  utilized  so  as  to  derive  a  maximum 
utility  over  an  N  year  period  ? 

28.  Consider  the  same  problem  under  the  assumption  that  the  return 
from  stocks  is  a  stochastic  quantity. 

29.  A  sophomore  has  tnree  girl  friends,  a  blonde,  a  brunette,  and  a  red¬ 
head.  If  he  takes  one  of  the  three  to  the  Saturday  night  dance,  the  other 
two  take  umbrage,  with  the  result  that  the  probability  that  they  will 
refuse  an  invitation  to  next  week’s  dance  increases.  Furthermore,  as  a 
result  of  his  invitation,  there  is  a  certain  probability  that  the  young  lady 
of  his  choice  will  be  more  willing  to  accept  another  invitation  and  a 
certain  probability  that  the  young  lady  will  be  less  willing. 

Assuming  that  feminine  memories  do  not  extend  back  beyond  one 
week,  what  dating  policy  maximizes  the  expected  number  of  dances  the 
sophomore  attends — with  a  date  ? 

30.  Obtain  a  sequence  of  recurrence  relations  equivalent  to  determining 

.v 

the  minimum  of  the  linear  form  L  —  U  xt,  subject  to  the  constraints 

1  —  1 

xi  ^  0,  xt  -f-  x<  +  x  ;>  at,  i  =  1,2,  . . .,  N  —  1.  Thus,  or  otherwise,  show 
that  Min  L  —  Max  a t,  granted  that  one  at  is  positive. 

I 

31.  Solve  the  corresponding  problem  for  the  case  where  the  constraints 

are  x<  -f-  Xt  + 1  4-  xt  +  2  ^  at,  i  =  1 ,  2 . N  —  2. 

32.  Determine  the  recurrence  relations  for  the  problem  of  minimizing  L 
s 

=  A’  ci  xt,  ci  0,  subject  to  the  constraints 

Xi  ;>  0,  hi  xi  -j-  di  xt  + 1  ;>  at,  i  =  1,2,  . . . ,  N  —  1 . 

33.  Solve  the  problem  formulated  above  in  (32)  for  the  case  where  the 
constraints  are 

a.  xi  xt  + 1  ;>  at,  i  1,2 . .V  —  1 ,  xk  a.v,  or 

b.  xt  Xt  + 1  ;>  at,  i  =  1 ,  2,  ....  A’  —  1 ,  .v,  ;>  a,  x.\  ;>  a.v,  or 

c.  Xt  4-  xt  1 1  -j  V(  4  s  ]>  tit,  i=l,2 . A  2,  a.v  _  1  +  a.v  a.v  -  1, 

a.v  >  a.v . 

plus  the  usual  constraint  ai  ^  0. 

34.  Show  how  to  approximate  to  /(a)  in  the  interval  a,  b ]  by  means  of  a 
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linear  function  ux  +  v  according  to  the  following  measures  of  deviation 


a. 


ux  —  t;)*  dx 


b.  Max  j  f(x)  —  ux  —  v[ 

a  <  x  '<  b 


35.  Suppose  that  it  is  necessary  to  traverse  a  distance  x.  If  we  travel  at  a 
speed  v  there  is  a  probability  p  ( v )  ds  of  being  stopped  in  the  interval 
(s,  s  +  ds)  and  incurring  a  delay  of  d  time  units.  At  what  fixed  speed 
should  we  travel  in  order  to  minimize  the  expected  time  required  to  cover 
a  distance  x?  (Greenspan) 


36.  Under  the  same  conditions  as  those  of  Problem  35,  at  what  speed 
should  we  travel  in  order  to  minimize  the  probability  of  requiring  more 
than  a  time  T  to  cover  the  distance  x? 


37.  Assume  that  there  is  a  penalty  of  p  dollars  when  stopped  and  that 
actual  travelling  time  costs  c  dollars  per  unit  time.  How  do  we  proceed  to 
minimize  expected  cost? 

38.  Obtain  a  recurrence  relation  equivalent  to  the  problem  of  minimizing 

.v 

the  quadratic  form  Qy  =  E  (x*  —  x*-_  ,)*  over  all  sets  of  values  for  the 

•V  k  -  1 

xt  for  which  E  Xt*  =  1,  xa  =  c. 

t  -  i 


39.  We  are  informed  that  a  particle  is  in  either  of  two  states,  which  we 
shall  call  S  and  T,  and  are  given  the  initial  probability  x  that  it  is  in  state 
T.  If  we  use  an  operation  A  we  reduce  this  probability  to  ax,  where  a  is  a 
positive  constant  less  than  1,  whereas  operation  L,  which  consists  of 
observing  the  particle,  will  tell  us  definitely  which  state  it  is  in.  It  is 
desired  to  transform  the  particle  into  state  S  in  a  minimum  time,  with 
certainty. 

If /(x)  is  defined  to  be  the  expected  number  of  operations  required  to 
achieve  this  goal,  show  that  /(x)  satisfies  the  equation 


/(x)  =  Min  j^'. 

/(0)  =  0. 


1  +*/(!)) 
I  -f/(ax)j 


0  <  x  1 , 


40.  Show  that  there  is  a  number  x„  in  the  interval  (0,1)  with  the  property 
that 
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Show  that 

/(l)  =  Min  (**  1.2 . 

1  1  —  a* 

“  (!—«)/(!)  "  (1  -  a)\k  +  l)  ' 

for  the  minimizing  value  of  k. 


41.  At  each  stage  of  a  sequence  of  actions,  we  are  allowed  our  choice  of 
one  of  two  actions.  The  first  has  associated  a  probability  p t  of  gaining  one 
unit,  a  probability  px  of  gaining  two  units,  and  a  probability  pa  of  gaining 
nothing  and  terminating  the  process.  The  second  has  a  similar  set  of 
probabilities  px ,  pa  .  We  wish  to  determine  a  sequence  of  choices 
which  maximizes  the  probability  of  attaining  at  least  n  units  before  the 
process  is  terminated. 

Let  p  (n)  denote  the  this  probability  for  n  =  1,  2,3,  . . ..  Show  that  p(ti) 
satisfies  the  equation 


p  («)  =  Max 


vpi  P  (»  —  1)  +  pi  P  («  —  2),  ' 
Ypx  Pin  —  1)  +  pt  p  (n  —  2)J  ' 


for  n  =  2,  3,  4,  .  . . ,  with  p  (0)  =  1 ,  and 

P  (1)  =  Max  (pu  />,'). 


42.  With  reference  to  §  7,  show  that  if  g  ( x )  and  h  (x)  are  quadratic  in  x, 
then  fs  (c)  =  «.v  +  -f  y.vc*  where  a.v,  fix,  yx  are  independent  of  c. 

43.  Show  that  there  exist  recurrence  relations  of  the  form 


«.v  +  i  =  R\  (ax,  fix,  yx), 
fix  + 1  =  Hi  (ax,  fix,  yx) , 
yx  +i  ==  R3  (ax,  fix,  yx), 
where  the  Kt  arc  rational  functions. 

44.  Treat  in  a  similar  way  the  problem  of  minimizing  the  function 

.v 

/(*„  Tj,.  .  .,  Xx)  =  -  [g  (Xk  —fk)  +  h  (Xk  —  xk  -  ,) 

k  ■  1 

+  tn  (Xk  —  2 Xk  -  ,  -f-  .v*-  _  z)] , 
where  g  (.r),  h  (x)  and  (x)  are  quadratic. 
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45.  Suppose  that  we  have  a  machine  whose  output  per  unit  time  is  r  (t)  as 
a  function  of  t,  its  age  measured  in  the  same  units.  Its  upkeep  cost  per 
unit  time  is  u  ( l )  and  its  trade-in  value  at  any  time  t  is  s(l).  The  purchase 
price  of  a  new  machine  is  p  >  s(0).  At  each  of  the  times  /  =  0, 1, 2, . . ., 
we  have  the  option  of  keeping  the  machine,  or  purchasing  a  new  one. 
Consider  an  unbounded  process  where  the  return  one  stage  away  is  dis¬ 
counted  by  a  factor  a,  0  <  a  <  I.  Let  f  (t)  represent  the  total  overall 
return  obtained  using  an  optimal  policy. 

Show  that  f  (t)  satisfies  the  equation 


f  (t)  =  Max 


r  (/)  —  u  (t)  +  af(l  +  1), 

s(t)  —  p  +  r  (0)  —  m  (0)  -f-  af  (1) 


46.  Using  the  fact  that  an  optimal  policy,  starting  with  a  new  machine, 
is  to  retain  the  machine  for  a  certain  number  of  time  periods,  and  then 
purchase  another  one,  determine  the  solution  of  the  above  equation. 


47.  Is  it  uniformly  true  that,  if  given  an  over-age  machine,  the  optimal 
policy  is  to  turn  it  in  immediately  for  a  new  one  ? 


48.  How  does  one  formulate  the  problem  to  take  into  account  technolo¬ 
gical  improvement  in  machines  and  operating  procedures? 


49.  A  secretary  is  looking  for  a  single  piece  of  correspondence,  ordinarily 
a  carbon  on  thin  paper.  She  usually  has  6  places  she  can  look 

Folder  Number  k 


Three  folders  of  about  30  sheets  each  1,2,3 

One  folder  of  about  50  sheets  4 

One  folder  of  about  100  sheets  5 

Elsewhere  6 


The  initial  probabilities  of  the  letter  being  in  the  various 

places  are  usually 

k 

Pk 

Probability  of 
letter  in  folder 

1-2* 

Probability  of 
being  found  on 
one  examination 
if  in  folder 

/*• 

Time  for  one 
examinatu  n 

1 

.11 

.95 

1 

2 

.11 

.  95 

1 

3 

.11 

.95 

1 

4 

.20 

.85 

2 

5 

.37 

-.70 

3 

G 

.10 

.  10 

100 
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How  shall  the  secretary  look  through  the  folders  so  as  to 

a.  Minimize  the  expected  time  required  to  find  a  particular  letter? 

b.  Maximize  the  probability  of  finding  it  in  a  given  time  ?  (F.  Mos- 
teller) 

50.  Let  the  function  a  ( x )  satisfy  the  constraint  a  (x)  <;  d  <  1  for  all  x. 
Show  that  the  solution  of  the  equation 

u  —  Max  [ b  (x)  -f  a  (x)  »«]  , 

I 

if  it  exists,  is  unique,  and  is  given  by  the  expression 
u  —  Max  b  (x)!(  1  —  a  (x))  . 

I 

Under  what  conditions  does  the  solution  exist  ? 

If  a  (x)  does  not  satisfy  the  above  condition,  show  that  the  number 
of  solutions  is  either  0,  1,  2  or  a  continuum,  and  give  examples  of  each 
occurrence. 

51.  We  are  given  a  quantity  x  >  0  that  is  to  be  utilized  to  perform  a 
certain  task.  If  an  amount  y,  0  y  <,  x,  is  used  on  any  single  attempt, 
the  probability  of  success  is  a  (y).  If  the  task  is  not  accomplished  on  the 
first  try,  we  continue  with  the  remaining  quantity  x — y.  Show  that  if 
/  (x)  represents  the  over-all  probability  of  success  using  an  optimal  policy, 
then / (x)  satisfies  the  functional  equation 

/  (x)  =  Sup  [a  (y)  +  ( 1  —  a  (y)  )  /  (.v  —  y)J . 

«!<#</ 

52.  Derive  the  corresponding  equation  for  1  -  /(.v),  the  probability  of 
failure. 

53.  Consider  the  two  cases  where  «  (y)  is  convex  or  concave,  and  obtain 
the  explicit  solutions  for  these  cases.  Observe  that  in  one  case  there  is  no 
optimal  policy. 

51.  Consider  the  process  discussed  in  §  2  under  the  assumption  that  the 
total  return  from  an  N-stage  process  is 

R\  =  g  (y)  4-  h  (x  —  v)  4-  g  (>’,)  4-  h  (.v,  y,)  4-  .  .  . 

4-  g  (y.v  -  i)  4-  ft  (4'.v  _  i  —  y.v  -  l)  4-  (-r.v), 

where  k  (x)  is  a  given  function. 

55.  Consider  the  functional  ecpiation 

/  (  v)  =  Max  [g  (y)  4-  h  {\  —  y)  4 -/(«>’  -4  b  (.v  >-))] , 

0  »/  x 
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under  the  assumption  that 

a.  g  (y)  ~  c,  y *,  h  (y)  ~  ct  y *,  cu  ct,  d  >  0,  as  y  oo 
or 

b,  g  (y)  ~  A  (y)  ~  c^.,  cl,ct,dl,dt  >  0  as  y  ->  oo . 

In  both  cases,  determine  the  asymptotic  behavior  of /(*)  as  *  ->  oo. 

56.  Determine  a  recurrence  relation  for 

Min 

>  o 

with  the  introduction  of  suitable  additional  parameters. 

57.  Consider  the  problem  of  determining  the  minimum  of  the  function 

.v  .v 

27  g*  (rk,  r*  +  i)  -f  27  hk{rk), 

*  -  i  *  -  i 

where  +  i  =  r,,  and  the  r*  are  subject  to  the  constraint 

a.  0  <,rk<ibk, 

x 

b.  27  9?t(r*)^c, 

*  -  l 

with  each  <pk  (x)  a  known  monotone  increasing  function  of  x,  <p k  (0)  —  0. 
Introduce  the  auxiliary  problem : 

Minimize 

g  («.  rt)  -f  g  (r„  r3)  H-  ...  +  g  ( r N  _  „  rN)  +  g  (r.v,  v) 

4-  27  A*  (r*) , 

with  rt,  rs . r a  subject  to  the  constraints 

а.  0<S  r*  <;  bk 

.v 

б.  27  <p\ t  (r*)  ^  c . 

*  -  2 

Show  that  if  we  designate  the  above  minimum  by  F  (it,  v,  c),  then  the 
minimum  in  the  original  problem  is  given  by 

Min  F  (r,.  r,,  c  —  (>-,)) . 

0  <  r,  <  I>, 


[X,  xt 

-----  +  —  - —  + 
*.  +  *»  x,  -f  x4 


x*  —  t  xn 

Xn  +  *1  +  X, 
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58.  Introduce  the  sequence  of  functions,  R  =  2.  3,  . . N  —  1, 

Fr  (u,  v,  c)  =  Min  [g  (u,  rF)  -f  g  ( rH ,  rR  + 1)  +  ...  +  g(rN-i,  rM) 

+  g  (rjv,  v)  -f  £  A*  (rt)] , 

k  -  It 

with 


F*  (u,  v,  c)  =  Min  [g(u,  rN)  +  g  ( rN ,  v)  -f  hN  (r*)] . 
r.v 

For  each  R,  admit  only  c-values  satisfying  the  restriction 
.v 

£  cpk  (bk)  2>  c,  where  the  bk  are  fixed  positive  constants. 

t  ~  it 

Show  that  we  have  the  recurrence  relation 

Fr  (u,  v,  c)  =  Min  [g  (u,  rR)  -f  A*  (rn)  +  Fr  + 1  ( tr ,  v,  c—<pR  (r*))] , 
rR 

where  rR  varies  over  the  interval  defined  by 

a.  0  <.rR<,bR. 

x 

b.  £  <pk  (bt)  ^>c  —  tpn  (ir)  . 

k  -  It  +  1 

59.  Consider  in  a  similar  fashion  the  problem  of  minimizing  a  function 
such  as 


Fx  “  g  (r„  r„  rs)  +  g  (rt,  r„  rt)  -f  ...  +  g  (r„  -  „  rN,  r,) 

+  g  (rx,  ru  rt). 

60.  Suppose  that  we  have  a  quantity  of  capital  x,  and  a  choice  of  the 
production  in  varying  quantities  of  N  different  products.  Assume  initi¬ 
ally  that  there  is  an  unlimited  supply  of  labor  and  machines  for  the  pro¬ 
duction  of  any  items  we  choose,  in  any  quantities  we  wish. 

If  we  decide  to  produce  a  quantity  xt  of  the  itH  item,  we  incur  the  follow¬ 
ing  costs: 

a.  at  ==  unit  cost  of  raw  materials  required  for  the  ith  item 

b.  bt  =  unit  cost  of  machine  production  of  ith  item 

c.  a  =  unit  cost  of  labor  required  for  i,h  item. 

d.  Ct  —  a  fixed  cost,  independent  of  the  amount  produced 

of  the  i,h  item,  if  .Vi  >  0. 

The  cost  of  producing  a  quantity  xt  of  the  ith  item  is  then 

gt  (x()  =  (at  -+-  bt  -f  ct)  x(  +  Ct.  xt  >  0 
=  0,  Xt  =  0. 
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Let  pi  be  the  selling  price  per  unit  of  the  »,A  item.  The  problem  is  to 
choose  the  so  as  to  maximize  the  total  profit 

.v 

Pn  —  E  piXt, 

i  -  I 

subject  to  the  constraints 

(a)  E  gi(xt)^.x, 

i  -  1 

(b)  xt  ^  0. 

Let 

f y  (x)  =  Max  Py . 

Show  that 

fi  (x)  =  pi  (x—  C1)/(a,  +  i1+c1)1  x^Cx, 

=  0,  0  <,  x  Ct, 

and 

fs  (x)  =  Max  [pN  xy  +  fs  -  i  (x  —  (**))]. 

'.v  5  " 

Show  that  x.\  ;>  0  can  be  replaced  by 

fs  -  i  (x)  —  fs  -  ,  (x  —  Cy) 

XX  ^  . 

py 

61.  Assume  that  the  demand  for  each  item  is  stochastic.  Let  Gk  (z)  repre¬ 
sent  the  cumulant  function  for  the  demand  r  for  the  k‘h  item.  Show  that 
the  expected  return  from  the  manufacture  of  xk  of  the  A"1  item  is 

pk  f  *  z  dGk  (z)  -f  pk  f  -v*  JG  (z) 

=  pk  I  *  z  dGk  (s)  -f  pk  \  k  (1  —  Gk  (xk)) , 

Jo 

and  obtain  the  recurrence  relation  corresponding  to  the  problem  of 
maximizing  the  total  expected  return. 

62.  Consider  the  problem  of  maxinvzing  the  probability  that  the  return 
exceed  r. 

63.  Consider  the  above  problem  in  the  deterministic  and  stochastic 
versions  when  there  are  restrictions  upon  the  quantity  of  machines 
available  and  the  labor  supply. 
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64.  Obtain  the  recurrence  relations  corresponding  to  the  case  where  we 
have  “complementarity”  constraints  such  as 

a.  .r,  x,  =  0,  x,  x ,,  =  0,  *,  xlt  xn  =  0, 
tffid  so  on,  or 

b.  xt  Xi  + 1  =•  0.  »  =  1,2,...,  N  —  1 . 

65.  Suppose  that  we  have  a  complicated  mechanism  consisting  of  N 
interacting  parts.  Let  the  part  have  weight  W<,  size  Si,  and  let  us 
assume  that  we  know  the  probability  distribution  for  the  length  of  time 
that  any  particular  part  will  go  without  a  breakdown,  necessitating  a  new 
part.  Assume  also  that  we  know  the  time  and  cost  required  for  replace¬ 
ment,  and  the  cost  of  a  breakdown.  Assuming  that  there  are  weight  and 
size  limitations  on  the  total  quantity  of  spare  parts  we  are  allowed  to 
stock,  how  do  we  stock  so  as  to  minimize 

a.  the  expected  time  lost  due  to  breakdowns, 

b.  the  expected  cost  of  breakdowns, 

c.  a  given  function  of  the  two,  time  and  cost, 

d.  the  probability  that  the  time  lost  due  to  breakdowns  will  exceed  T, 

e.  the  probability  that  the  cost  due  to  breakdowns  will  exceed  C? 

06.  Determine  the  possible  modes  of  asymptotic  behavior  of  the  sequence 
{«„}  determined  by  the  recurrence  relation 

«»fi  =  Max  [aun  +  b,  cun  +  d] , 

and  generally  by  the  recurrence  relation 

u„  + 1  =  Max  [<q  n„  -f-  bi] ,  i  —  1 ,  2.  . .  . ,  k . 

i 

(cf.  Problem  50). 

67.  Determine  the  minimum  of 

v 

F  (.r,,  *, . a-.v)  =  2’  gi  (xt)  -f  Max  (*,,  at . . . . 

i  -  i 

subject  to  die  constraints  Xi  ;>  0. 

68.  Suppose  that  we  have  .V  different  activities  in  which  to  invest  capital. 
Let  g<  (xi )  be  the  return  from  the  i,h  activity  due  to  an  investment  of  .vj. 
Given  an  initial  quantity  of  capital  x,  we  are  required  to  invest  in  at 
most  k  activities  so  as  to  maximize  the  total  return. 

Denote  the  maximum  return  by  /*,  v  (v).  Show  that  we  have  the 
recurrence  relation 
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fk,  x  (.v)  —  Max 
for  \  <L  k  <,  N  —  1 . 


Max  [g.v  (vi  4-  A  - 1.  v  - 1.  (x  —  y)]. 

0  <  *  <  x 


ft.  X  -  1  (x) 


69.  Two  corporations,  with  interlocking  directorate,  are  forbidden  by 
anti-monopoly  statutes  from  investing  in  the  same  enterprise.  The  first 
corporation  has  capital  x  to  invest,  the  second  capital  y,  with  known 
returns  (2)  from  an  investment  of  a  quantity  of  capital  z  in  the  *,h  of 
N  different  enterprises. 

Show  that  if  the  directors  wish  to  maximize  the  total  return  from  the 
two  corporations,  they  must  maximize 

.v  .v 

f'.v  (Xt ,  y<)  =  -  g(  (xi)  4-  £  gi  (>•(), 

•  -  1  1-1 


subject  to  the  constraints 


.V 

a.  2T  xi  =  x,  X(  ;>  0, 
i  -  1 

x 

b.  r  y,  =  v,  yi  ^  0. 
<  -  1 

c.  xi  yi  —  0. 

Let 


fx  (x,  y)  =  Max  /  .%  (.v(,  y<) 
{■re 


Show  that 


/.v  (x,  y)  =  Max 


Max  g.v  (y.v )  -f  fx  - 1  (x,  y  —  y.v)  ] 
Max  g.v  (x.v)  -f  /.v  - 1  (x  —  x.v,  y)  ) 

0  <  X  V  <  X 


Consider  the  case  where  the  different  corporations  derive  different 
returns  from  the  same  enterprise. 


70.  It  is  decided  to  employ  a  policy  of  replacing  all  light  bulbs  in  an 
office  building  at  one  time.  Assume  that  the  cost  of  replacing  the  bulbs 
is  a,  and  that  g  (x)  represents  the  cost  due  to  lack  of  lighting  if  a  time 
interval  x  elapses  between  replacements.  Over  a  time  interval  T,  it  is 
decided  to  make  replacements  at  times  Xi,  Xi  -f  x>,  .  .  .,  xi  4-  *2  4-  •  ■  •  4~ 
4-  Xn  l ,  where  n  is  to  be  determined. 
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The  efficiency  of  the  program  is  to  he  measured  by  the  average  loss 
sustained 

£  (a  +  K  (*i)  ) 

/•'  (xi,  As,  . .  . ,  a„)  =  — 

What  is  the  optimal  policy? 

(I.  R.  Savage) 

71.  Let  the  functions  gi  (a)  be  such  that  the  maximum  of 

x 

I'n  (*i,  xz,  ....  xN)  =  £  gt  ( x( ) 

i  -  1 
X 

over  the  region  .At  ;>  0,  £  xt  =  c  may  be  obtained  by  use  of  a  Lagrange 

«  -  i 

multiplier  A,  considering  the  expression 

6\v  =  £  gt  (xt)  — A  £  Xi. 

« -  i  ...  <-i 

On  the  other  hand,  let  f\  ( c )  —  Max  F\.  Show  that 

{'} 

A  —  fs'  (c) 

x 

Obtain  the  corresponding  result  for  the  maximum  of  £  gt  (.n,  y<) 

i  -  i 

subject  to 

X  X 

£  Xi  =  ci,  £  yi  —  c 2,  xt,  yt  =>  0. 

i-i  i-i 

7 '2.  Let 

Mr  (t|,  xz,  ....  ,v.v)  —  the  rth  largest  of  the  quantities  Ai,  as,  . .  .,  a.v, 

Mr  (x\,  a'2 . a.v)  —  the  rth  smallest  of  the  quantities  x\,  As,  . . .,  xn, 

for  r  =  1,2 . A\  Obtain  recurrence  relations  connecting  the  members 

of  the  sequences 

{Mr  (Al,  as . A.v)},  {Mr  (aj.  A* . A.v)},  T  —  1,2,  ..... 

X 

73.  Consider  the  problem  of  maximizing  £  2  zi 

i  -  1 


subject  to  the  constraints  a<  )>  0,  £  1/(1  +  a <)  <  a. 

(J.  V.  Whittaker) 
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74.  A  gambler  has  a  capital  of  x  dollars  and  wishes  to  bet  on  the  outcomes 
of  N  different  events.  There  is  a  probability  pi  that  he  can  predict  the 
k,H  outcome  correctly.  The  only  constraint  on  the  total  amount  that  he 
bets  is  the  condition  that  he  be  able  to  pay  off  his  losses. 

Show  that  the  problem  of  maximizing  his  expected  return  may  be 
converted  into  the  problem  of  maximizing 
.v 

Ls  ( x )  =  2.’  pi  xt  subject  to  the  constraints 

t  -  i 

(a)  xt  >  0, 

.v 

(b)  2.'  xt  <,  x  +  x},  j  =  1,2,..,,  A’. 

i  -  l 

75.  Consider  the  problem  of  maximizing 

.v 

Ls  (a)  =  2.'  pk  xi 

k  «*  1 

subject  to  the  constraints 

(a)  .v<  >  0 
.v 

(b)  1'  A |  <,H  +  X j 


(c)  2;  ac i  ^  r. 

t  -  i 

Define  Js  (n,  v)  =  Max  Tv  (v).  Show  that 

fs  («,  v)  Max  px  x x  -f  fs  -i  (><  v.v,  Min  (v  —  v.v,  u)  )  ] 


7ti.  I  he  problem  of  designing  an  efficient  water  distillation  plant  for 
heavy  water  production  involves  the  minimization  of 


Vs 


K  ("i) 


g  ((/,.)  _  g(«j)  j  g  («>") 

«I  rti  ll’  (It  (1-2  ...  ti  ,n  _  1 


where  the  at  an-  subject  to  the  constraints 
(a)  </,  '>  1 

(!>)  </j  1 1  >  ...  Um  v. 


Show  that  this  ma\-  be  reduced  to  the  functional  equation 


A  *  i  (A 


Min  Ig  (</,) 
«,  i  L 


and  find  the  solution  m  the  case  where  g  (v)  yh,  h  >  0. 
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77.  Consider  the  case  where 

1/  /  \  i  (^2)  .  -  gm  {&mj 

Vx  =  gi  (a  1)  -f - b  •  •  •  H - • 

a  1  d|  <*2  .  .  .  Om  -  1 

(K.  Cerri,  M.  Silvestri  and  S.  Villan,  “The  Cascading  Problem  in  a  Water 
Distillation  plant  and  Heavy  Water  Production,”  7  S'aturforschg.,  11a, 
694  (1956).) 

78.  Consider  the  problem  of  allocating  resources  to  N  different  activities, 
leading  to  the  problem  of  maximizing  a  function 

Z  gi  (xt)  subject  to  the  constraints  Z  =  c,  xt  2>  0. 

Show  that  the  function  /y  (c)  obtained  via  the  usual  recurrence  relations 
does  not  depend  upon  the  way  in  which  the  activities  are  numbered. 


Bibliography  and  Comments  for  Chapter  I 

§  1 .  A  fairly  complete  bibliography  of  papers  up  to  1954  plus  some  remarks 
which  complement  the  text  may  be  found  in  K.  Bellman,  "The  Theory  of 
Dynamic  Programming,”  Bull.  Amer.  Math.  Soc.,  vol.  60(1954),  pp.  503-516. 
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37-48 

§  7.  Further  discussion  of  this  problem  may  be  found  in  R.  Bellman, 
"A  Class  of  Variational  Problems,”  Quart,  of  Appl  Math.,  1956.  An  inter¬ 
esting  discussion  of  general  "smoothing”  problems  may  be  found  in  I.  J. 
Schoenberg.  "On  Smoothing  Functions  and  their  Generating  Functions,” 
Butt.  Amer.  Math.  Soc.,  vol.  59  (1953),  pp.  199-230.  where  a  number  of 
further  references  may  be  found. 

§11.  The  iin[>ortance  of  the  concept  of  approximation  in  policy  space 
was  stressed  in  K.  Bellman,  "On  Computational  Problems  in  the  Theory 
of  Dynamic  Programming,”  Symposium  on  Numerical  Methods,  Amer.  Math. 
Soc.  Santa  Monica,  1953. 

§  12.  The  elegant  proof  of  I.emma  1  was  found  independently  by  I  Glicks- 
berg  and  W.  Fleming  to  whom  the  author  posed  the  problem  of  finding  a 
better  proof  than  that  given  in  the  opening  lines  of  the  section. 

§  17.  The  results  in  this  section  were  derived  by  I).  Anderson. 

sj  IK.  A  more  complete  discussion  of  the  concept  of  the  stability  of  solutions 
of  fur  tional  equations  may  be  found  in  R.  Bellman,  Stability  Theory  of 
Differential  liquations ,  McGraw-Hill,  1954 

§  19.  The  reduction  of  the  sequence  {/*•,  v  (*)}  to  a  sequence  {/*  (*)}  is  an 
important  piece  of  mathematical  legerdemain  as  far  as  computational 
solutions  are  concerned;  cf  also  §  ti  and  §  7.  The  limited  storage  capacity 
of  computing  machines  makes  one  quite  stingy  with  subscripts  and  para¬ 
meters. 
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§  22.  The  proof  in  the  text  follows  a  paper  of  S.  Johnson,  "Optimal 
Search  is  Fibonaccian,”  1955  (to  appear). 

An  equivalent  result  was  found  earlier  by  J.  Kiefer,  unbeknownst  to 
Johnson,  using  a  much  more  difficult  argument-  J.  Kiefer,  "Sequential 
Minimax  Search  for  a  Maximum,”  Proc.  Amer.  Math.  Soc  ,  vol.  4  (1953), 
pp.  502-6. 

The  problem  of  determining  a  corresponding  result  for  higher  dimensions 
seems  extraordinarily  difficult,  and  nothing  is  known  in  this  direction  at 
the  present  time. 

§  24.  An  excellent  introduction  to  the  study  of  stochastic  processes  is 
given  in  the  book  by  W.  Feller,  Probability  Theory,  John  Wiley  and  Sons, 
1948.  A  number  of  important  physical  processes  are  discussed  in  the  book 
by  M.  S.  Bartlett,  An  introduction  to  stochastic  processes  with  special  reference 
to  methods  and  applications,  Cambridge,  1955. 

Exercise  76.  See  R.  Bellman,  Nuclear  Engineering,  1957 
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A  Stochastic  Multi-Stage  Decision  Process 

§  1,  Introduction 

In  the  preceding  chapter  we  considered  in  some  detail  a  multi-stage 
decision  process  in  both  deterministic  and  stochastic  guises.  In  this 
chapter  we  shall  discuss  a  stochastic  multistage  decision  process  of  an 
entirely  different  type  which  possesses  a  number  of  interesting  features.  In 
particular,  in  obtaining  the  solution  of  some  simple  versions  of  processes 
of  this  type,  we  shall  encounter  the  important  concept  of  “decision 
regions”. 

We  shall  follow  essentially  the  same  lines  pursued  in  the  previous 
chapter,  first  a  statement  of  the  problem,  then  a  brief  discussion  in  clas¬ 
sical  terms.  Following  this,  the  problem  will  be  formulated  in  terms  of  a 
functional  equation,  the  required  existence  and  uniqueness  theorems  will 
be  proved,  and  then  the  remainder  of  the  chapter  devoted  to  a  discussion 
of  various  properties  of  the  solution,  such  as  stability  and  analytic 
structure. 

For  the  simple  process  used  as  our  model,  we  are  fortunate  enough  to 
obtain  a  solution  which  has  a  very  interesting  interpretation.  Equally 
fortunately  as  far  as  the  mathematical  interest  of  the  problem  is  concern¬ 
ed,  this  solution  does  not  extend  to  more  general  processes  of  the  same 
type.  This  forces  us  to  employ  techniques  of  an  entirely  different  type 
which  we  shall  discuss  in  a  later  chapter,  Chapter  8. 

The  failure  of  the  elementary  solution  is  not  due  solely  to  the  inade¬ 
quacy  of  the  analysis.  A  counter-example  has  been  constructed  showing 
that  the  solution  of  a  multi-stage  decision  process  of  this  class  cannot 
always  have  the  simple  form  of  the  solution  given  in  §  8  below.  Another 
proof  of  this  fact  is  furnished  by  Lemma  8  of  Chapter  8. 

A  number  of  interesting  results  which  we  do  not  wish  to  discuss  in 
detail  are  given  as  exercises  at  the  end  of  the  chapter. 

§  2.  Stochastic  gold-mining 

We  shall  cast  the  problem  in  the  mold  of  a  gold-mining  process. 

Suppose  that  we  are  fortunate  enough  to  own  two  gold  mines,  Ana¬ 
conda  and  Bonanza,  the  first  of  which  posL  'sses  within  it£  depths  an 
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amount  of  gold  x,  and  the  second  an  amount  of  gold  y.  In  addition,  we 
have  a  single,  rather  delicate,  gold-mining  machine  with  the  property 
that  if  used  to  mine  gold  in  Anaconda,  there  is  a  probability  />,  that  it 
will  mine  a  fraction  r,  of  the  gold  there  and  remain  in  working  order,  and 
a  probability  (1  —  px)  that  it  will  mine  no  gold  and  be  damaged  beyond 
repair.  Similarly,  Bonanza  has  associated  the  corresponding  probabilities 
px  and  1  —  pt,  and  fraction  rt. 

We  begin  the  process  by  using  the  machine  in  either  the  Anaconda  or 
Bonanza  mine.  If  the  machine  is  undamaged  after  its  initial  operation, 
we  again  make  a  choice  of  using  the  machine  in  either  ol  the  two  mines, 
and  continue  in  this  way  making  a  choice  before  each  operation,  until  the 
machine  is  damaged.  Once  the  machine  is  damaged,  the  operation  ter¬ 
minates,  which  means  that  no  further  gold  is  obtained  from  either  mine. 

What  sequence  of  choices  maximizes  the  amount  of  gold  mined  before 
the  machine  is  damaged  ? 

§  3.  Enumerative  treatment 

Since  we  are  dealing  with  a  stochastic  process,  it  is  not  possible  to  talk 
about  the  return  from  a  policy,  a  point  wc  have  already  discussed  in  §  24 
of  the  previous  chapter,  nor  can  we  choose  a  policy  which  guarantees  a 
maximum  return.  We  must  console  ourselves  with  measuring  the  value 
of  a  policy  by  means  of  some  average  of  the  possible  returns,  and  choosing 
an  optimal  policy  on  this  basis.  As  before,  the  simplest  such  average  is 
the  expected  value. 

Let  us  then  agree  that  we  are  interested  in  the  policies  (since  there  may 
be  many)  which  maximize  the  expected  amount  of  gold  mined  before  the 
machine  is  damaged.  A  policy  here  will  consist  of  a  choice  of  zi’s  and  B‘ s, 
A  for  Anaconda  and  B  for  Bonanza.  However,  any  such  sequence  such  as 
(1)  S  =  A  A  HUB  ABB  . . . 

must  be  read:  A  first,  then  A  again  if  the  machine  is  undamaged,  then  K 
if  the  machine  is  still  undamaged,  and  so  on. 

Let  us  initially,  to  avoid  the  conceptual  difficulties  inherent  in  un¬ 
bounded  processes,  consider  only  mining  operations  which  terminate 
automatically  after  N  steps  regardless  of  whether  the  machine  is  unda¬ 
maged  or  not.  In  this  case  it  is  quite  easy,  in  theory,  to  list  all  feasible 
policies,  and  to  compute  all  possible  returns.1  It  is  possible  to  use  this 
idea  to  some  extent  in  certain  problems.  However,  in  general,  this  proce¬ 
dure  is  rather  limited  in  application,  unrevealing  as  to  the  structure  of  an 
optimal  policy,  and,  as  a  brute  force  method,  a  betrayal  of  one’s  mathe¬ 
matical  birthright. 

1  To  quote  numliers  again,  a  10-stage  policy  would  require  the  listing  o{  210  = 
1024  possible  policies;  if  three  choices  at  each  stage,  then  ."»!>, 04!)  diiterent  policies. 
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§  4.  Functional  equation  approach 

In  place  of  the  above  enumcrative  approach,  we  shall  once  again  em¬ 
ploy  the  functional  equation  approach.  Let  us  define 

(1)  Jn  (x,  y)  —  expected  amount  of  gold  mined  before  the  machine 

is  damaged  when  A  has  x,  B  has  y  and  an  optimal 
policy  which  can  last  at  most  N  stages  is  employed. 

Considering  the  one-stage  process,  we  see  that  an  A-choice  yields  an 
expected  amount  px  rl  x,  while  a  fi-choice  yields  pt  r,  y.  Hence 

(2)  /,  (x,  y)  =  Max  [pt  r,  x,  pt  r,  y] . 

Let  us  now  consider  the  general  (N  +  l)-stage  process.  Whatever 
choice  is  made  first,  the  continuation  over  the  remaining  N  stages  must 
be  optimal  if  we  wish  to  obtain  an  optimal  (N  -f-  l)-stage  policy.  Hence 
the  total  expected  return  form  an  A -choice  is 

(3)  I  A  (X,  y)  =  />,  (r,  X  +  In  ((1  —  r,)  *.  y)) , 
and  the  total  expected  return  from  a  B-choice  is 

(4)  fit  (x,  y)  =  pt  (r2  y  -f  fs  (x,  (1  —  r.)  y)) . 

Since  we  wish  to  maximize  our  total  (Ar  +  l)-stage  return,  we  obtain 
the  basic  recurrence  relation 

(5)  Js  + 1  (x,  y)  =  Max  [fA  (x,  y),f„  (x,  y)  ], 

.  =  Max  [pt  (rt  x  -f  fs  ((1  —  ',)  x,  y).  p,  y  + 
fs  (x,  ( 1  —  rt)  y))]. 


§  .5.  Infinite  stage  approximation 

The  same  argumentation  shows  that  the  return  from  the  unbounded 
process,  which  we  call /(x,  y),  assuming  that  it  exists,  satisfies  the  func¬ 
tional  equation 

(1)  /(x,  y)  =  Max  [/>,  (r,  x  +/((1—  r.)  x,  y)),  p2  {r,  y  +  /(x,  (1  — r2)y))]. 

Once  again,  the  infinite  process  is  to  be  considered  as  an  approximation 
to  a  finite  process  with  large  AT.  In  return  for  the  advantage  of  having 
only  a  single  function  to  consider,  we  face  the  necessity  of  establishing 
the  existence  and  uniqueness  of  a  solution  of  the  equation  in  (1).  This  we 
proceed  to  do  in  the  next  section. 
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§  6.  Existence  and  uniqueness 

Let  us  now  prove  the  following  result : 

Theorem  1.  Assume  that 

(1)  a.  |M.  |M<1,‘ 

b.  0  <[  r„  rt  <  1 . 

Then  there  is  a  unique  solution  to  (5.1)  which  is  bounded  in  any  rectangle 

o  <;  *  <;  x.  o  <;  y  <s  y. 

This  solution  f  ( x ,  y)  is  continuous  in  any  finite  part  of  the  region  x,  y  ^>0. 

Proof:  Let  us,  to  simplify  the  notation,  set 

(2)  Tl(f)  =  pl[rlx+f((l-rl)x,y)], 

Tt(f)  =  Pt[rty  +/(*.  (1  —r,)  y)]. 

Then  the  functional  equation  in  (5.1)  has  the  form 

(3)  /(*.  y)  =  Max  [T,  (/),  Tt  {/)] . 

Define  the  sequence  of  functions 

(4)  /,  (-V.  y)  =  Max  [/>,  r,  at,  pt  r,  y] , 

/jv  + 1  (x,  y)  =  Max  [T ,  (/*),  T ,  (/*)] , 

=  Max  [7\  (/*)] 

i  1,2 

precisely  as  in  the  recurrence  relation  of  (4.3b 

Let  t  =•  i  (Ar)  =  i  (iV,  .r,  y)  be  an  index  which  yields  the  maximum  in 
the  expression  Max  [7' i(/.v)],  for  V  =  1,2,  ... 

j  «  1,2 

Then  we  have, 

(5)  /v  +  i  (x,  y)  =  7<  (.v)  (/v)  ;>  T t  (.v  + 1>  (/v) 

/.v  +  2  (x,  y)  =  /  (  ;n  + 1)  (/n  +  i)  i>  1 1  (.v)  (/.v  + 1) , 

using  the  same  device  we  employed  in  the  course  of  the  existence  and 
uniqueness  proof  for  the  solution  of  the  functional  equation  in  (6.1)  of 
Chapter  1. 

*  In  the  equation  arising  from  the  process  described  above,  the  f>i 
negative  The  proof  we  give  covers  the  more  general  equation  as  well. 

r,i 


are  non- 
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Her.ce 

(6)  \fs  +  1  (x,  y)  — fs  +*  ( x ,  y)  |  <  Max  [  |  Ti  (n>(/n)  —  Ti  isi(fs  +  i)  |, 

I  Tt  {s  +d(/n)  —  Tun  +  d(/at  +  i)  |  ] 

Max  [  |  7\  (fs)  —  T i  (/at  +  i)  |  ] 

i  -  I.  2 

^  Max  [  \pt  |  |/v((l— r,)  *,  y)  —  fN  +i  ((1  —  r,)  x,y)\, 

I  />«  I  \  fs  (x.  (1  —  rt)  y)  —/>•+!  (x,  (1  —  r,)  y)  |] . 

Let  us  now  define 

(7)  un(x,  y)  =  Max  \fN  {s,  t)  —  fN  +i  (s,  t)  \ 

o  <  *  <  I 
(*<<<» 

From  (6)  we  obtain 

(8)  mjv  + 1  (x,  y)  <,  q  us  {x,  y) , 

where  q  =  Max  (  |  pi  |.  |  pt  |  )•  Since  0<;  q  <  1,  we  see  that  the  series 

oo 

E  Us  (x,  y)  converges  uniformly  in  any  bounded  rectangle  0  <Lx<CX. 
x  -  i  _ 

O^y^  V.  Hence  fs(x,y)  converges  uniformly  to  a  function  f{x,y) 
which  satisfies  the  relation  (5.1),  and  which  is  continuous  in  any  bounded 
rectangle  in  the  (x,  y) -plane. 

The  uniqueness  proof  follows  the  same  lines  as  the  proof  of  Theorem  1 
of  Chapter  1  and  is  left  as  an  exercise  for  the  reader. 

As  we  see  from  the  above  proof,  the  choice  of /,  ( x ,  y)  is  arbitrary  pro¬ 
vided  only  that  it  be  bounded  in  any  finite  rectangle.  It  is  interesting  to 
note  that  the  limit  function  will  be  continuous  even  if  the  initial  function 
is  not,  as  a  consequence  of  the  uniqueness  of  the  solution. 

§  7.  Approximation  in  policy  space  and  monotone  con¬ 
vergence 

As  before,  it  is  easily  seen  that  we  can  ensure  monotone  convergence 
by  approximation  in  policy  space,  in  the  case  where  />,,  pt  ;>  0.  The  two 
simplest  approximations  arc  those  corresponding  to  A 00  and  B“.3  From 
the  first  policy  we  obtain  the  expected  return 

(1)  /a  (x,  y)  =  r,  a/(  1  —  /»,  (1  —  rx)) , 
and  from  the  second,  the  return 

(2)  fn  (*,  y)  =  pt  r2  y/(l  —  Pt  (1  —  r2)) . 


*  It  is  interesting  to  observe  tin*  following  tlilference  between  the  process  and 
the  functional  equation  obtained  from  it  The  sequence  A 00  is  conditional  as  far 
as  the  process  is  concerned,  but  deterministic  as  lar  as  the  equation  is  concerned 
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As  we  shall  see  below  in  §  8  and  §  9,  we  actually  possess  a  far  more  so¬ 
phisticated  technique  for  obtaining  a  first  approximation  in  the  discussion 
of  more  complicated  processes,  at  the  expense,  of  course,  of  the  above 
simplicity  of  expresrion.  The  guiding  principle  is,  however,  quite  simple. 

§  8.  The  solution 

Let  us  now  turn  to  the  solution  of  the  equation  in  (5.1)  for  the  case 
where  px  and  pt  are  real  numbers  satisfying  the  inequality  0  <;  plt  px  <  1 . 
It  is  intuitively  clear  that  an  A-choicc  is  made  when  x/y  1  and  a 
B-choicc  is  made  when  y\x  1  *. 

It  is  also  easily  seen  that  the  choice  at  each  stage  depends  only  on  the 
ratio  x/y,  since  /  ( kx ,  ky)  —  kf  (x,  y)  for  k  >  0.  Perhaps  the  quickest 
way  to  prove  this  is  to  invoke  the  uniqueness  theorem,  although  it  is 
intuitively  clear  from  the  description  of  the  process. 

It  follows  then  that  if  we  examine  the  positive  (x,  y)-quadrant,  and 
divide  it  into  an  A -set  and  a  B-set,  which  is  to  say  those  values  of  x  and  y 
at  which  an  A  -decision  is  the  optimal  first  choice  and  those  at  which  the 
B-decision  is  optimal,  then  (x,  y)  in  the  A -set  implies  that  ( kx ,  ky)  is  in 
the  A-set  for  all  k  .>  0,  and  similarly  for  the  B-set. 

If  these  sets  are  well-behaved,  it  follows  that  their  boundaries  must  be 
straight  lines, 


as  conceivably  in  the  figure  above.  The  regions  where  A  and  B  are  used 
are  called  decision  regions. 

Let  us  now  boldly  conjecture  that  there  are  only  two  regions,  as  in 
Figure  2, 

and  see  if  we  can  determine  the  boundary  line,  L,  if  this  is  the  case. 

4  The  notation  a  I  signifies  that  a  is  very  large  compared  to  1, 
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What  is  the  essential  feature  of  the  boundary  line  which  will  enable  us 
to  determine  its  equation  ?  It  is  this:  it  is  the  line  on  which  A  or  B  choices 
are  equally  optimal. 

If  we  use  A  at  a  point  ( x ,  y),  with  an  optimal  continuation  from  the 
first  stage  on,  we  have 

(1)  /a  (*.  y)  =  Pi  'i  x  +  />./((!  —  r ,)  *,  y) , 
while  similarly  B  at  (x,  y),  and  an  optimal  continuation,  yield 

(2)  /b  (X,  y)  =  pi  r,y  +  ptf(x,  (1  —  r,)  y) . 

Equating  these  two  expressions  we  obtain  the  equation  for  L.  Unfortu¬ 
nately,  this  equation  as  it  stands  is  of  little  use  since  it  involves  the  un¬ 
known  function  /. 

In  order  to  complete  the  analysis  successfully  we  must  make  a  further 
observation.  When  at  a  point  on  L  we  employ  A,  we  decrease  x  while 
keeping  y  constant  and  hence  enter  the  B  region ;  similarly,  if  we  use  B 
at  a  point  on  L  we  enter  the  A  region  (see  Figure  2  above).  It  follows  that 
for  a  point  on  L  an  initial  first  choice  of  A  is  equivalent  to  an  initial  first 
and  second  choice  of  A  and  then  B,  while,  conversely,  an  initial  first 
choice  of  B  is  equivalent  to  an  initial  first  and  second  choice  of  B  and 
then  A. 

If  we  use  A  and  then  B  and  continue  optimally,  we  have 

(3)  /ad  (x,  y)  =  pi  r,  .v  +  pt  p,  rt  y  +  />,  />,/((  1  —  r.)  x,  (1  —  rt)  y) , 
and  similarly 

(4)  fBA  (x,  y)  —  P,rty  +  pi  p,rtx  -f  pi  ptf({  1  —  r,)  *,  (1  —  rt)  y) . 

Equating  Jad  and  /da,  the  unknown  function  /  disappears*  and  we 
obtain  the  equation 

4  The  meaning  of  this  is  that  having  survived  both  an  ,-t  choice  and  a  H  choice, 
it  is  no  longer  of  any  importance  in  the  continuation  of  the  process  as  to  the  original 
order  of  these  choices. 
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(5)  Pi  'i  xj(l  — />,)  =ptrt  y/(l  —pt), 


for  L. 

It  remains  to  establish  this  equation  rigorously.  Let  us  begin  by  proving 
that  there  is  a  region  near  the  x-axis  where  A  is  always  the  optimal  first 
choice. 

If  y  =  0,  we  have 


(6) 


/  (x,  0)  =  Max 


Pi'iX  +  pif((l—  r,)  x,  0) 

/>./(*.  0) 


=  Pi  'I*  +  —rt)  x,  0). 


Since  /  (x,  y)  is  continuous  in  y,  it  follows  that 


(7)  f{x,y)>pt{rty+f(x,{\  —  rt)y)), 

for  0  <;  y  <C,  kx,  where  k  is  some  small  positive  constant,  since  the  strict 
inequality  holds  for  y  —  0. 

Thus  we  Ijave  a  region  in  which  A  is  used  first,  shown  below  in  Figure  3. 


Let  us  now  take  a  point  P  —  P  (x,  y),  in  the  region  between  L  and  y 
y  =  kx,  with  the  property  that  (x,  (1  —  r,)  y)  is  in  the  shaded  region.  In 
other  words,  use  of  B  at  P  must  result  in  a4  A  -choice  next,  provided  that 
machine  is  undamaged.  (This  proviso  is  necessary  when  discussing  the 
process,  but  not  when  discussing  the  equation,  as  we  have  noted  above.) 

If  B  is  optimal  at  P,  we  obtain 

/(*->’)  =/ba  (r,  y), 


(8) 
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as  given  by  (4).  However,  we  know  that  below  L,/ba  (x,  y)  <  Jab  ( x ,  y). 
Hence  B  cannot  be  optimal  at  P.  Proceeding  inductively  in  this  fashion 
we  extend  the  shaded  region  up  to  L.  Since  precisely  the  same  argument 
shows  that  the  region  between  L  and  the  y-axis  is  a  ^-region,  we  have 
completed  the  proof  of 


Theorem  2.  Consider  the  equation 


(9) 


f(x,  y)  =  Max 


>i!>i*  +/((!  —  rt)  x,  y)],j 

Pt[r*y  +/(*.  (1  —  y)]  J  ' 


x.y  ^  0, 


where  0  ^  pu  pt  <  1,  0  ^  r„  rt  <.  1. 
The  solution  is  given  by 


(10)  f(x,y)  =  pt[rtx  -f /((l  —  r,)  at,  y)],/or 

Pi  rt  x/(l  —  pt)  >  px  rt  y/(  1  —  px) 
=  pi  f>*  y  +f(x.  (1  —  rt )  y)],/or 
Pi  r i  x/(l  —  px)  <  pt  rt  y/(  1  —  pt) . 


For  px  r,  x/(l  —  px)  —  pt  rt  y/(l  —  pt)  either  choice  is  optimal. 


§  9.  Discussion 

The  solution  has  a  very  interesting  interpretation.  We  may  consider 
px  r,  x  to  be  the  immediate  expected  gain  and  (1  —  px)  to  be  the  imme¬ 
diate  expected  loss.  The  theorem  then  asserts  that  the  solution  consists 
of  making  the  decision  which  at  each  instant  maximizes  the  ratio  of 
immediate  expected  gain  to  immediate  expected  loss.  As  we  shall  see, 
this  intriguing  criterion  occurs  from  time  to  time  throughout  the  theory 
of  dynamic  programming. 


§  10.  Some  generalizations 

The  same  methods  suffice  to  prove  the  two  results  below. 
Theorem  3.  Consider  the  equation 


(1) 


f(x,  y)  =  Max 


A:  E  pk\ckx  -]-f{c'kx,  y)] , 

k  -  1 

B:  E  qk[dky  +/(x,  d'ky)} 

k  -  l 


where  x,  y  ^  0  and 

(2) 


x  x 

(a)  pk  ;>  o ,  ?*  ;>  0,  E  pk,  E  qk  <  1, 


*  -  i  *  -  l 

(b)  1  2>  ck,  dk  ]>  0,  c' k  +  c*  =  d' k  +  dk  =  1  . 
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The  optimal  choice  of  operations  is  the  following:  If 

s  A 

£  pt  c*  £  qtdt 

(3)  —  -l-v —  *  >  *  "-'  ,v —  y 

1  —  £  pk  1  —  £  qk 

k  -  1  t  -  1 

choose  A  ;  if  the  reverse  inequality  holds,  choose  B.  In  case  of  equality,  either 
choice  is  optimal. 

Theorem  4.  Consider  the  functional  equation 

K 

(4)  f(xu  xt, . . xN)  =  Max  [  £  pik[ctkXi  +/(*„  xt,  . . . ,  xf, . .  .,*„)]] 

i  k  l 

where  x<  ;>  0  and 


K 


(5) 

(a) 

Ptk  0,  £  pik  <  1,  *  =  1,  2,  . 

k  -  1 

(b) 

1  >  C  It  ^  0,  Cik  -}-  Ctk  —  1  . 

The  decision  functions  are 

K 

£  pik  Ctk 

Dt  (x)  —  A. - x( 

1  —  £  ptk 
k  -  1 

in  the  sense  that  the  index  which  yields  the  maximum  o/Dt  (x)  for  *==1,2,..., 
n  is  the  index  to  be  chosen  in  (4).  In  case  of  equality,  it  is  a  matter  of  indiffer¬ 
ence  as  to  which  is  used. 

It  is  clear  that  we  can  combine  Theorems  3  and  4  into  one  more  com¬ 
prehensive  result,  which  in  turn  can  be  generalized  by  the  use  of  the 
Stieltjes  integral.  Thus  a  version  of  (1)  arising  from  a  continuous  dis¬ 
tribution  of  outcomes  is 


f{x,  y)  =  Max 


Jo  0*  +  /((!  —  A  x.  y)]  dG  (z), 

Jo  [K’y  +  / (X,  (1  —  w)  y)]  dH  (w). 


We  leave- the  derivation  of  the  extensions  of  Theorems  3  and  4,  and  the 
statements  and  proof  of  the  corresponding  existence  and  uniqueness 
theorem,  as  exercises  for  the  reader. 
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$11.  The  form  of  f  (x,  y) 

Having  obtained  a  very  simple  characterization  of  the  optimal  poliev, 
let  us  now  turn  our  attention  to  the  function / (x,  y).  In  general,  no  simple 
analytic  representation  will  exist.  If.  however,  we  consider  the  equation 


(1) 


/  (.v.  y)  =  Max . 


\axx  +  aty  +  pt/(ct  x.  y)l 
U,  x  +  bt  y  +  qtf(x,  dt  y)J 


we  can  show  that  if  c,  and  tlt  are  connected  by  a  relation  of  the  type  c,'" 
=  rf,".  with  m  and  n  positive  integers,  a  piece-wise  linear  representation 
for  /  (x,  y)  may  be  obtained. 

It  is  sufficient,  in  order  to  illustrate  the  technique,  to  consider  the  sim¬ 
plest  case  where  the  relation  is  ct  dt. 

Let  (a-,  y)  be  a  |K)int  in  the  .1-region.  If  A  is  applied,  (x,  y),  this  point  is 
transformed  into  (c,  x,  y),  which  may  be  in  either  an  A  -  or  a  /^-region. 
Let  /.,  be  the  line  that  is  transformed  into  /.*  when  (x,  y)  goes  into  (c,  x,  y) , 
let  Lt  be  the  line  transformed  into  /.,,  and  so  on.  Similarly,  let  M,  be  the 
line  transformed  into  /.  when  (v,  v)  goes  into  (v,  dt  \’) ,  and  so  on.  In  the 
sector  /.()/.,,  A  is  used  first,  followed  by  li,  as  shown  below. 


Hence,  for  (  v,  y)  in  this  sector  we  obtain 

(2)  f{x.  y)  1 1 ,  x  +  a,  y  -f  p,f(ct  x,  y) 

<i,  v  4  fl,y  +  pi  (bx  c,  x  4-  bt  y)  -f  q ,  ptfic^,  ct  y) 
i‘h  +  Pi  btc2 )  x  4-  («i  4-  pi  bt)  y  4-  pt  qtctf(x,y) 

*  Tlic  txnnulary  luir,  whose  o|iiatiim  nhtainri!  as  above,  is 

«|(t  <h)  l> I  •)  x  /',(!  />4  +  •)  ]  V 
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This  yields 


/(*.  y)  = 


(a,  -f  pt  b ,  c.)  x  +  («»  -i  />t  bt)  y 

1  —  pt  qt  ct 


for  ( x ,  y )  in  LOZ.,.  Similarly,  we  obtain  a  linear  expression  for /  in  LOMx. 
Having  obtained  the  representations  in  these  sectors,  it  is  clear  that  we 
obtain  linear  expressions  in  L,  0Lt,  and  so  on. 

§  12.  The  problem  for  a  finite  number  of  stages 

Let  us  first  establish 

Theorem  5.  Consider  (he  recurrence  relations 

(!)  fi  (*.  y)  =  Max  {py  rx  x,  pt  rt  y) 

r  i  \  [A:  M'i  * +/n  (0  —  »-,)*,  y)],l 

fs  +  i  (*,  y)  =  Max  , 

LH:  Pt[r*y  +  /*(*-  (1  —  rt)  y)]J 

N  =  1,  2 . 

For  each  N,  there  are  two  decision  regions. 

Proof.  For  each  Ar  ;>  2,  the  points  determined  by  the  condition  that  AB 
plus  an  optimal  continuation  for  the  remaining  (jV-2)  moves  is  equivalent 
to  BA  plus  an  optimal  continuation  for  the  remaining  (Ar-2)  moves  lie  on 
the  same  line  L  we  have  determined  above,  namely 

r.  P>r*y 


Figure  4a 

For  the  Ar-stage  process,  any  policy,  and  consequently,  any  optimal 
policy  has  the  form 

(3)  S.v:  A",  Bb,  . . .  A“.\  Bbx, 

where  the  at  and  bt  are  positive  integers  or  zero,  restricted  by  the  condi¬ 
tion,  (at  +  bt)  =  N. 
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Let  us  now  consider  a  point  P  =  P  (x,  y)  lying  above  L.  If  A  is  used  at 
P,  there  are  two  possibilities:  either  A  is  used  k  times  in  succession,  an<} 
then  followed  by  B, 

(4)  Sn  —  A*B  \  k  <.  N  —  1  , 

or  Sn  —  A-v.  Let  us  consider  the  first  case.  If  A  is  used  ( k  —  1)  times  in 
succession,  we  reach  a  point  P'  further  above  L.  At  P',  AB  cannot  be  the 
first  two  moves  in  an  optimal  (N  —  k  -f-  l)-stage  policy,  since  BA  plus 
an  optimal  continuation  is  superior. 

Consequently  above  L,  either  B  is  used  first,  or  the  optimal  policy  is 
As.  Let  us  now  show  that  if  A s  is  optimal  at  P,  then  it  is  optimal  in  the 
region  between  OP  and  the  *-axis. 

To  demonstrate  this  we  begin  with  the  observation  that  it  is  permis¬ 
sible  to  assume  that  x  -f  y  =  1,  0  ■<,  x,  y  <,  1,  because  of  the  homoge¬ 
neity  of  f\  (x,  y)  as  a  function  of  x  and  y.  Considering  the  N-stage  process, 
we  see  that  there  are  2V  possible  policies,  say  Pu  Pt,  ....  PtN.  Each  of 
these  policies  used  at  a  point  (x,  y)  yields  a  AT-stage  return  which  is  a 
linear  function  of  x  and  y,  say  Lie  [x,  y).  For  x  y  =  1,  we  may  plot 
these  functions  obtaining  a  set  of  2V  straight  lines, 


FiRurc  5 

If  N  were  2,  so  that  the  four  policies /l /l ,  AB,BA,BB  yielded  four  lines 
as  above,  the  maximum  return  as  a  function  of  x  would  have  the  form 


0 


X 

Figure  6 


I 
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It  is  clear  that  As  is  an  optimal  policy  for  y  —  0,  x  —  1.  It  follows  that  if 
As  is  optimal  at  (x,  y),  0  <  y  <  1,  the  line  corresponding  to  As  will  do¬ 
minate  all  other  lines  for  x  <;  x  1. 

Combining  the  above  results  we  see  that  for  any  N ,  the  boundary  be¬ 
tween  the  A -region  and  the  B-region  will  either  be  AB  =  BA  or  A  s  —  M 
where  A/,  is  a  policy  of  complicated  form,  or  Bs  —  Mt  is  also  a  complic¬ 
ated  policy. 

We  can  now  establish  a  sharper  result: 

Theorem  6.  The  decision  regions  for  fN  converge  towards  those  of  f  as 
N  — >  oo  in  a  monotone  fashion.  There  is  always  an  integer  N0  with  the 
property  that  for  N  ^  N0  the  regions  for  f\  are  identical  with  those  of  f. 

Proof:  Consider  the  situation  for  N  =  3.  Let  Lt  be  the  boundary  line 
for  the  two-stage  process,  and  assume  that  the  relative  positions  of  Lt 
and  L  are  as  shown  below. 


Let  Lt  (/I  ')  denote  the  line  transformed  into  Lt  when  A  is  used  at  a  point 
on  l.x  (A  which  is  to  say  when  (x,  y)  is  transformed  into  (cx,  y).  Let  Q 
be  a  point  in  the  sector  between  Lt  and  Lt  (.4_I).  If  A  is  used  at  Q  as  the 
first  move  in  a  three-stage  policy,  B  is  used  next,  since  the  transformed 
point  is  in  the  B-region  for  a  two  stage  process.  However,  if  Q  is  above  L, 
we  know  that  AB  cannot  be  the  first  two  moves  of  an  optimal  policy. 
Hence  B  is  used  at  Q.  This  shows  that  the  B-region  for  the  three-stage 
process  is  at  least  that  containing  the  region  above  Lt  (/l-1).  This  process 
may  be  continued  for  larger  and  larger  .V  until  L*  (/l-1),  for  some  finite  k. 
lies  below  L.  At  this  point,  the  boundary  line  becomes  AB  =  BA,  and 
remains  so  for  all  larger  N. 

§  13.  A  three-choice  problem 

Let  us  now  assume  that  in  addition  to  the  two  A  and  B  choices  already 
discussed,  we  have  a  third  choice  which  is  a  compromise  between  the  A 
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and  B  choices.  The  equation  we  obtain  in  this  case  takes  the  form 


(1) 


f(x,  y)  =  Max 


"A :  px  [r,  x+  /((l  —  r,)  x,  y )] 

B:  pt[rty  +  /(x,(l  —  '.)  y)] 

C:  p,[r,x  +  r4y  +/((!  —  r,)  x,  (1  —  r4)  y)] 


where  0  <,  rt,  r4  <;  1  and  0  <,pt  <  1.  and  the  quantities  />,,  pt,  rx,  rt 
satisfy  the  previous  inequalities. 

On  the  basis  of  what  we  know  concerning  the  solution  of  the  equation 
where  the  C-term  is  missing,  it  might  be  suspected  that  the  solution  of 
this  equation  would  be  determined  in  the  following  way :  There  are  three 
decision  regions,  as  in  the  figure  below,  with  A,  B  and  C  each  optimal 
first  choices  in  these  regions 


Unfortunately,  a  counter-example  has  been  constructed  showing  that 
this  is  not  true  generally.  It  shows,  by  means  of  a  fairly  complicated  but 
straightforward  calculation,  that  the  solution  can,  for  suitable  values  of 
the  parameter,  take  the  form  shown  in  Figure  9  below. 

The  solution  of  (1)  above  seems  to  be  quite  a  difficult  problem,  and  very 
little  is  known  concerning  the  character  of  the  solution. 


Figure  9 
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It  is  not  even  known  whether  or  not  the  number  of  decision  regions  is 
always  finite  and  whether  the  number  is  uniformly  bounded  if  finite.  To 
obtain  some  information  about  this  problem  in  a  part  of  the  parameter 
space,  we  shall  consider  a  continuous  version  in  Chapter  8,  where  with  the 
aid  of  variational  techniques  the  decision  regions  may  be  determined. 

For  the  continuous  version  they  do  assume  the  simple  form  shown  in 
the  first  figure  above,  Figure  8. 

§  14.  A  stability  theorem 


Let  us  now  derive  a  stability  theorem  for  the  solution  7  of  the  equation 


(1) 


f(x,  y)  =  Max 


Pi  [rxx  +/((1—  r,)  x,  y)]1 
Pt  [r.y  +/(*,(!—  r,)y)]J 


Theorem  7.  Let  g  ( x ,  y)  be  the  solution  of 


(2) 


g  ( x ,  y)  =  Max 


[A :  Px  [r,  x  +  g  ((1  —  r,)  x,  y)]' 
LB:  pt  [rty  +  g  (x,  (1  —  rt)  y)]. 


+  h  (x,  y) . 


Then,  in  any  rectangle  R  :  0  <,  x  <,  X ,  0  <,  y  <,  Y 
-  (3)  I  /  (*.  >')  —g(x,y)\^  Max  |  h  (x,  y)  \  \q, 

it 

where  q  =  Min  ((1  —  />,),  (1  —  £,)). 


Proof.  The  proof  proceeds  by  successive  approximations,  as  in  the 
corresponding  section  in  Chapter  1.  Consequently,  we  shall  merely  sketch 
the  details.  Set 


(4)  /,  (x,  y)  =  Max  [/>,  r,  x,  pt  r2  y] 

gi  (x,  y)  =  Max  [p,  rt  x,  pt  rty]  +  h  (x,  y). 


and,  generally. 


(5) 


fn  +1  (X,  y)  =  Max 


gn  +1  (x,  y)  =  Max 


[A:  Pi  [rxx  -f  /„  ((1  —  r,)  x,  y)]l 
Lb  :  pt  [r,  y  -f  fn  (x,  (1  —  rt)  y)]J 

TA:  />,  >,  x  +  g„  ((1  —  r,)  x,  y)]l 
Lb:  pt  [r,  y  -f  (x,  (1  —  r„)  y)]J 


+  h  (x,  y) . 


It  is  clear  that 

(6)  |  /,  (x,  y)  —  g,  (x,  y)  j  <  Max  |  h  (x,  v)  .  - 

n 

7  Hy  the  term  “solution”,  here  and  in  the  following  pages,  we  shall  mean  the 
unique  solution  in  the  a5>propriatc  function  class. 
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Applying  the  techniques  used  repeatedly  above,  we  see  that 

(7)  Max  |/»  +1  (*,  y)  — gn  +  i(x,  y)  |  ^  Max  px  | /»  (x,  y)  —  gn  (x,  y)  | 

ft  ft 

-|-  Max  |  h  | 

R 

where  pt  =  Max  (px,  pt).  Iteration  of  this  inequality  yields 

(8)  Max  \fn  ( x ,  y)  —  gn  ( x ,  y)  |  ^  Max  |  h  |  (1  +  pt  +  . . .  Pt*-1), 

R  R 

for  n  =  2, . Letting  »->oowe  obtain  the  stated  result. 


Exercises  and  Research  Problems  for  Chapter  II 

1.  With  reference  to  the  process  described  in  §  2,  consider  the  case 
where  the  purpose  of  the  process  is  to  maximize  the  expected  value  of 
qp  (R),  where  R  is  the  total  return,  and  qp  ( z )  is  a  given  function  of  z. 
Define  the  function 

/  (x,  y,  a)  =  expected  value  of  cp(R)  obtained  employing  an  optimal 
policy  with  initial  quantities  x  and  y  in  the  respective  mines 
and  a  quantity  a  already  mined. 


Show  that  f  (x,  y,  a )  satisfies  the  following  functional  equation 


f{x,  y,  a)  =  Max 


PiK'i'  X,  y,  a  +rlx)  +  px'  <p  («n 
Ptf(x,  rt'  y,  a  -f  r2  y)  +  p,’  qp  (a)J 


x,y  ^0 


/( o.  «)  =  ?(«)• 


Here  pi  —  1  —  pi,  pt  —  l  —  pi,  r,'  —  1  —  ru  r,'  —  1  —  r, 


2.  Establish  an  existence  and  uniqueness  theorem  for  this  equation. 

3.  Consider  the  case  where  qp  (r)  is  defined  as  follows:  qp  ( z )  =  0, 

0  z  <  u,  qp  (z)  =  1,  z  ;>  u,  where  u  <  x  -f  y. 


4.  Let  g  (x,  y)  =  Max  Exp  {ebT),  b  >  0,  where  Exp  stands  for  expected 
/* 

value  and  we  maximize  over  all  policies  P.  Show  that  g  ( x ,  y)  satisfies  the 
equation 


k  (x,  y; 


A:  pi  r6V  g  (r,'  x,  y)  +  pi' 
.13:  pt  cbr,v  g  (x,  rt'  y)  + 
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5.  Show  that  the  solution  of  the  above  equation  is  determined  by  the 
relation  between  the  functions  px  (<r*r1x  —  1  )//>,'  and  pt  ( e *r<*'  —  1 ) 


6.  Show  that  Theorem  2  is  the  limiting  case  of  this  result  as  b  ->  0. 


7.  The  function  g  ( x ,  0)  satisfies  the  equation 

g  (x,  0)  =  />,  ebr>*  g  (r,'  x,  0)  +  />,'  • 
Obtain  its  asymptotic  behavior  as  x  ->  oo. 


8.  Referring  to  Problem  1  obtain  some  sufficient  conditions  upon  <p  (x) 
which  will  ensure  precisely  two  decision  regions. 


9.  Solve  the  equation 


f(x,  y)  —  Max 


A:  Pi  [r,  x  +/(r,'  x,  y)] 
pt[rty  +  f(x,rt'  y)} 

C:  P,[r,x  +  r4y+f  (lx,  ty )] 


10.  Solve  the  equation 

A :  x  4-  /  (ax, 

f  (x,  y)  =  Max  „  ,  ’  . 

JK  Ll3:  y  +f(cy,dx)\ 

assuming  that  0  <,  a,  b,  c,  d  <  1. 


*y)l 
•>,  dx)  J 


(Gross-Shapiro) 


11.  Consider  the  process  described  in  §  2  under  the  assumption  that  there 
is  a  probability  px  of  obtaining  r,  x  and  continuing,  a  probability  pt  of 
obtaining  nothing  and  continuing,  and  a  probability  p3  of  obtaining 
nothing  and  terminating,  if  A  is  chosen,  with  px  -f  pt  -f  p3  —  1,  with 
similar  probabilities  qu  q2,  q3  if  B  is  choserT  Show  that  th'C  corresponding 


functional  equation  is 

f(x,  y)  =  Max 

[A: 

Px  r , 

*+/((! 

rx)  x,  y)] 

+  Pzf{x, 

[b: 

Ci  tsi 

y+f  (*.(i 

—  Si)  >')] 

+  qtf  (x, 

and  that  this  may 

be  written 

in  the  simpl 

er  form 

A: 

Px 

1  —  p*r'x 

ri)  x.  y)] 

/(*.  y) 

Max 

B: 

,  %>■ 

1  —  <7* 

-f  /(v.  (1 

-  s,)  y)) 

12.  Consider  the  process  described  in  §  2  in  which  it  is  not  possible  to 
observe  the  effect  of  any  of  the  decisions  once  the  process  has  started. 
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Discuss  the  problem  of  determining  the  policies  maximizing  the  expected 
return  in  the  following  situations: 

a.  when  the  machine  is  undamaged,  it  mines  a  fixed  fraction  of  the 
gold  in  any  particular  mine. 

b.  when  the  machine  is  undamaged,  there  is  a  distribution  of  returns. 

Suppose  that  we  wish  to  maximize  the  probability  that  the  return  ex¬ 
ceeds  a  fixed  quantity  /?„. 

13.  Consider  the  process  described  in  §  2  under  the  assumption  that  the 
machine  mines  a  fixed  quantity  in  each  mine,  dependent  upon  the  mine,  in 
place  of  a  fixed  fraction,  as  long  as  the  amount  remaining  in  the  mine 
exceeds  the  fixed  amount. 


14.  Show  that  the  equation  in  (5.1)  is  equivalent  to 


/  (z)  —  Max 


(A: 

lli: 


for  0  ■ 


<  oo. 


Pi  [r,  +  (l-r,)/(*/(l 
Pi  (>**  +/((!—>'*)  *)] 


15.  Consider  the  equation 


/  (x,  y)  =  Max 


rx  +/((1  —r)  x,  y) 
q  [sy  +f(x,  (1  —  s)  >')]. 


for  x,  y  2>  0,  0  <;  r,  s.  q  <  1 . 
Show  that  a  solution  is 


f(x,  y)  =  v  + 


sqv 

1  —  7(1  —  •')’ 


1 0.  Show  that  the  gold-mining  process  generating  this  equation  possesses 
no  optimal  policy,  i.e.  no  policy  yielding  this  return,  but  that  there  are 
arbitrarily  many  policies  yielding  a  return  of  more  than 

sqy 

v  -f  .  ,  — d  for  anv  d  >  0. 

I— 7(1— ,s) 

17.  Prove  that  the  solution  above  is  not  unique  in  the  class  of  bounded 
functions  over  any  bounded  rectangle,  but  that  it  is  unique  over  the  class 
of  functions  / (v,  v)  for  which  / (0,  0)  — -  0,  f[x,y)  is  continuous  at 
v  =-  y  =  0. 


Bibliography  and  Comments  for  Chapter  II 

§  I.  The  concept  of  "decision  regions”  is  a  very  important  one  in  the 
study  of  decision  processes.  We  shall  meet  it  again  in  Chapter  V  III,  where  it 
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guides  us  to  the  solution  of  the  variational  problems  treated  there,  and  again 
in  Chapter  IX,  in  connection  with  variational  problems  with  constraints.  An 
interesting  paper  in  this  connection  is  K.  D.  Arrow,  D.  Blackwell,  M.  Girshick, 
"Bayes  and  Minimax  Solutions  of  Sequential  Decision  Problems,"  Econo - 
metrica,  vol.  17  (1049),  pp,  213-214. 

§  8.  The  result  of  §  8  was  obtained  in  conjunction  with  M.  Shiftman  in 
the  summer  of  1950. 

§  12.  The  type  of  geometric  argument  used  here  was  extensively  developed 
by  S.  Karlin  and  H.  N.  Shapiro  to  give  an  alternative  proof  of  Theorem  2 
and  other  results. 

§  13.  The  first  counter-example  was  obtained  by  S.  Karlin  and  H.  N. 
Shapiro  after  a  great  deal  of  fruitless  effort  had  been  expended  attempting 
to  establish  a  result  based  upon  Figure  8.  See  S.  Karlin  and  H.  N.  Shapiro, 
"Decision  Processes  and  Functional  Equations,"  RM-933,  Sept.  1952,  The 
RAND  Corporation. 
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The  Structure  of  Dynamic  Programming  Processes 

§  1.  Introduction 

In  this  chapter  we  wish  to  examine  and  compare  the  essential  features 
of  the  two  processes  we  have  considered  in  some  detail  in  the  first  and 
second  chapters.  Disparate  as  these  processes  may  seem  at  first  glance, 
one  being  of  deterministic  type  with  a  stochastic  version  and  the  other 
of  a  stochastic  type  with  no  deterministic  version,  we  shall  see  that  from 
an  abstract  point  of  view  they  are  examples  of  the  same  general  type  of 
process.  It  is  therefore  no  accident  that  they  are  governed  by  functional 
equations  of  a  similar  form. 

After  a  discussion  and  analysis'  of  these  similarities,  we  shall  consider 
the  formulation  of  the  more  general  decision  processes  and  from  these 
derive  a  number  of  functional  equations  possessing  a  common  structure. 
We  could,  if  we  so  desired,  condense  these  into  one  all-embracing  func¬ 
tional  equation.  However,  since  extreme  generality  is  only  gained  at  the 
expense  of  fine  detail,  it  seems  decidedly  better,  from  both  a  conceptual 
and  analytic  point  of  view,  to  consider  separately  a  number  of  important 
sub-categories  of  processes,  each  of  which  possesses  certain  distinctive 
mathematical  and  physical  features. 

We  shall  close  the  chapter  with  a  further  discussion  of  the  concept  of 
approximation  in  function  space,  which  we  have  already  encountered  in 
the  previous  chapters,  and  a  demonstration  of  its  most  important  pro¬ 
perty,  that  of  monotone  convergence. 

§  *2.  Discussion  of  the  two  preceding  processes 

Let  us  begin  by  observing  that  the  processes  discussed  in  Chapters  I 
and  II  have  the  following  features  in  common: 

a.  In  each  case  we  have  a  physical  system  characterized  at  any  stage 
by  a  small  set  of  parameters,  the  state  variables. 

b.  At  each  stage  of  either  process  we  have  a  choice  of  a  number  of 
decisions. 

c.  The  effect  of  a  decision  is  a  transformation  of  the  state  variables. 

d.  The  past  history  of  the  system  is  of  no  importance  in  determining 
future  actions. 
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e.  The  purpose  of  the  process  is  to  maximize  some  function  of  the  state 
variables. 

We  have  purposely  left  the  description  a  little  vague,  since  it  is  the 
spirit  of  the  approach  to  these  processes  that  is  significant  rather  than  the 
letter  of  some  rigid  formulation.  It  is  extremely  important  to  realize  that 
one  can  neither  axiomatize  mathematical  formulation  nor  legislate  away 
ingenuity.  In  some  problems,  the  state  variables  and  the  transformations 
are  forced  upon  us;  in  others  there  is  a  choice  in  these  matters  and  the 
analytic  solution  stands  or  falls  upon  this  choice;  in  still  others,  the  state 
variables  and  sometimes  the  transformations  must  be  artificially  con¬ 
structed.  Experience  alone,  combined  with  often  laborious  trial  and  error, 
will  yiel<i  suit  able  formulations  of  involved  processes. 

Let  us  now  identify  the  two  processes  discussed  in  the  foregoing 
chapters  with  the  description  given  above. 

In  the  unbounded  multi-stage  allocation  process,  the  state  variables 
are  x,  the  quantity  of  resources,  and  z  the  return  obtained  up  to  the  cur¬ 
rent  stage.  The  decision  at  any  stage  consists  of  an  allocation  of  a  quan¬ 
tity  y  to  the  first  activity  where  0  <[  y  <;  x.  This  decision  has  the  effect  of 
transforming  x  into  ay  -f  b  (x  —  y)  and  z  into  z  -f-  g  (y)  +  A  (x  —  y).  The 
purpose  of  the  process  is  to  maximize  the  final  value  of  z. 

In  the  stochastic  gold-mining  p-ocess,  the  state  variables  are  x  and  y, 
the  present  levels  of  the  two  mines,  and  z  the  amount  of  gold  mined  to 
date.  The  decision  at  any  stage  consists  of  a  choice  of  Anaconda  or  Bo¬ 
nanza.  If  Anaconda  is  chosen,  (x,  y)  goes  into  ((1  — r,)  x,  y)  and  z  into 
z  -j-  r,  x,  and  if  Bonanza,  (x,  y)  goes  into  (x,  (1  —  r,)  y)  and  z  into  z  rty. 
The  purpose  of  the  process  is  to  maximize  the  expected  value  of  z  obtained 
before  the  machine  is  defunct. 

In  the  finite  versions  of  both  processes,  we  have  the  additional  para¬ 
meter  of  time,  manifesting  itself  in  the  form  of  the  number  of  stages  re¬ 
maining  in  the  process.  It  is,  however,  very  useful  to  keep  this  state 
variable  distinct  from  the  others,  since,  as  usual,  time  plays  a  unique  role. 

Let  us  now  agree  to  the  following  terminology:  A  policy  is  any  rule  for 
making  decisions  which  yields  an  allowable  sequence  of  decisions;  and  an 
optimal  policy  is  a  policy  which  maximizes  a  preassigned  function  of  the 
final  state  variables.  A  more  precise  definition  of  a  policy  is  not  as  readily 
obtained  as  might  be  thought.  Although  not  too  difficult  for  deterministic 
processes,  stochastic  processes  require  more  care.  For  any  particular 
process,  it  is  not  difficult  to  render  the  concept  exact.  The  key  word  is, 
of  course,  "allowable”. 

A  convenient  term  for  this  preassigned  function  of  the  final  state  vari¬ 
ables  is  criterion  function.  In  many  applications,  the  determination  of  a 
proper  criterion  function  is  a  matter  of  some  difficulty.  From  the  analytic 
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point  of  view,  a  solution  may  he  quite  easy  to  obtain  for  one  criterion 
function,  and  quite  difficult  for  a  closely  related  one.  It  is  well,  conse¬ 
quently,  to  retain  a  certain  degree  of  flexibility  in  the  choice  of  such 
functions. 

§  3.  The  principle  of  optimality 

In  each  process,  the  functional  equation  governing  the  process  was 
obtained  by  an  application  of  the  following  intuitive: 

Principle  of  Optimality.  An  optimal  policy  has  the  property  that  what¬ 
ever  the  initial  state  and  initial  decision  are,  the  remaining  decisions  must 
constitute  an  optimal  policy  with  regard  to  the  state  resulting  from  the  first 
decision. 

The  mathematical  transliteration  of  this  simple  principle  will  yield  all 
the  functional  equations  we  shall  encounter  throughout  the  remainder 
of  the  book.  A  proof  by  contradiction  is  immediate. 

§  4.  Mathematical  formulation — I.  A  discrete  deterministic 
process 

Let  us  now  consider  a  deterministic  process,  by  which  we  mean  that  the 
outcome  of  a  decision  is  uniquely  determined  by  the  decision,  and  assume 
that  the  state  of  the  system,  apart  from  the  time  dependence,  is  described 
at  any  stage  by  an  /-dimensional  vector  p  —  (pt,pi . />.u),  con¬ 

strained  to  lie  within  some  region  P.  Let  T  =  {'/'„}  where  q  runs  over  a 
set  S  which  may  be  finite,  enumerable,  composed  of  continua,  or  a  com¬ 
bination  of  sets  of  this  type,  be  a  set  of  transformations  with  the  property 
that  p  f  P  implies  that  /',  (p)  e  P  for  all  q  f  S,  which  is  to  say  that  any 
transformation  Tq  carries  P  into  itself. 

The  term  “discrete”  signifies  here  that  we  have  a  process  consisting  of 
a  finite  or  denumerably  infinite  number  of  stages. 

A  policy,  for  the  finite  process  which  we  shall  consider  first,  consists  of 
a  selection  of  .V  transformations  in  order,  P  =  (7  „  T2,  .  . . ,  7'.v),1  yielding 
^successively  the  sequence  of  states 

1)  px  =  7'.  \P). 

Pi  —  7  j  (/h). 


ps  =  7  .v  (ps  -i)  . 

These  transformations  are  to  be  chosen  to  maximize  a  given  function, 
R,  of  the  final  state  ps. 

1  where  we  write  T.  for  /'  ,  7 .,  for  7  „  ,  and  so  on 
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There  are  a  number  of  cases  in  which  it  is  easy  to  see  that  a  maximum 
will  exist,  in  which  case  an  optimal  policy  exists.  The  simplest  is  that 
where  there  are  only  a  finite  number  of  allowable  choices  for  q  at  each 
stage.  Perhaps  next  in  order  of  simplicity  is  where  we  assume  that  D  is  a 
finite  closed  region,  with  R  (/>)  continuous  in  p  for  p  e  D,  T„  (p)  jointly 
continuous  in  p  and  q  for  all  p  e  D  and  all  q  belonging  to  a  finite  closed 
region  5. 

These  two  cases  cover  the  most  important  of  the  finite  processes,  while 
their  limiting  forms  account  for  the  unbounded  processes. 

Observe  that  the  maximum  value  of  R  (Pn),  as  determined  by  an 
optimal  policy,  will  be  a  function  only  of  the  initial  vector  p  and  the 
number  of  stages  N.  Let  us  then  define  our  basic  auxiliary  functions 

(2)  fs(p)  =  MaxR(pN) 

p 

=  the  N-stage  return  obtained  starting  from  an  ini¬ 
tial  state  p  and  using  an  optimal  policy. 

This  sequence  is  defined  for  N  —  1,2,  ....  and  for  p  e  D. 

Simple  as  this  step  is,  it  represents  a  fundamental  principle  in  analysis, 
the  principle,  of  continuity.  In  order  to  solve  our  original  problem  involv¬ 
ing  one  initial  vector,  p,  and  a  multi-stage  process  of  a  definite  number  of 
stages,  N,  we  consider  the  entire  set  of  maximization  problems  arising 
from  arbitrary  values  of  p  and  from  an  arbitrary  nur  iber  of  stages. 

The  original  process  has  thus  been  imbedded  within  a  family  of  similar 
processes.  In  place  of  attempting  to  determine  the  characteristics  of  an 
optimal  policy  for  an  isolated  process,  we  shall  attempt  to  deduce  the 
common  properties  of  the  set  of  optimal  policies  possessed  by  the  mem¬ 
bers  of  the  family. 

This  procedure  will  enable  us  to  resolve  the  original  problem  in  a  num¬ 
ber  of  cases  where  direct  methods  fail. 

To  derive  a  recurrence  relation  connecting  the  members  of  the  sequence 
{fs  {/>)},  let  us  employ  the  principle  of  optimality  stated  above  in  3. 
Assume  that  we  choose  some  transformation  Tq  as  a  result  of  our  first 
decision,  obtaining  in  this  way  a  new  state  vector,  T„  ( p ).  The  maximum 
‘‘return”’  from  the  following  (JV  —  1)  stages  is,  by  definition, fs-\  (T „  (/>)). 
It  follows  that  if  we  wish  to  maximize  the  total  AT-stage  return  q  must 
now  be  chosen  so  as  to  maximize  this  N  —  1  stage  return.  The  result 
is  the  basic  recurrence  relation 

(3)  fN  (p)  =  Max/.v  -  i  (Tv  (p)) , 

flS 

for  N  2,  with 

*  i.e.  the  value  of  the  criterion  (unction. 
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(4)  /,  (p)  =  Max  R  (Tq  (/>)) 

f..s- 

Observe  that  fs  (p)  is  unique,  but  that  the  q  which  maximizes  is  not 
necessarily  so.  Thus  the  maximum  return  is  uniquely  determined,  but 
there  may  be  many  optimal  policies  which  yield  this  return. 

For  the  case  of  an  unbounded  process,  the  sequence  {/*(/>)}  is  replaced 
by  an  single  function  /  (p),  the  total  return  obtained  using  an  optimal 
policy  starting  from  state  p,  and  the  recurrence  relation  is  replaced  by  the 
functional  equation 

(5)  /(/>)  =  Max/ (Tff(*)). 

♦ 

§  5.  Mathematical  formulation — II.  A  discrete  stochastic 
process 

Let  us  once  again  consider  a  discrete  process,  but  one  in  which  the 
transformations  which  occur  are  stochastic  rather  than  deterministic. 

A  decision  now  results  in  a  distribution  of  transformations,  rather  than 
a  single  transformation.  The  initial  vector  p  is  transformed  into  a  stochas¬ 
tic  vector  2  with  an  associated  distribution  function  dGq(p,  z),  depend¬ 
ent  upon  p  and  the  choice  q. 

Two  distinct  types  of  processes  arise,  depending  upon  whether  we 
assume  that  z  is  known  after  the  decision  has  been  made  and  before  the 
next  decision  has  to  be  made,  or  whether  we  assume  that  only  the  dis¬ 
tribution  function  is  known.  We  shall  only  consider  processes  of  the  first 
type  in  this  volume,  since  processes  of  the  second  type  require  in  general 
the  concept  of  functions  of  functions,  which  is  to  say  functionals. 

It  is  clear,  as  we  have  stated  several  times  before,  that  it  is  now  on  the 
whole  meaningless  to  speak  of  maximizing  the  return.  Rather  we  must 
agree  to  measure  the  value  of  a  policy  in  terms  of  some  average  value  of 
the  function  of  the  final  state.  Let  us  call  this  expected  value  the  return. 

Beginning  with  the  case  of  a  finite  process,  we  define  fN  (p)  as  in  (4.2). 
If  z  is  a  state  resulting  from  any  initial  transformation  T q,  the  return 
from  the  last  N  —  1  stages  will  be  fs  -  i  {z),  upon  the  employment  of  an 
optimal  policy.  The  expected  return  as  a  result  of  the  initial  choice  of  Tq 
is  therefore 

(1)  f  }n  -  ,(z)  dGq(p,z) 

Jz  €  D 

Consequently,  the  recurrence  relation  for  the  sequence  {/v  (/>)}  is 

(2)  fs  (P)  =  Max  f  fs  -  ,  (z)  dGq  (p,  z),  N  2 , 

f t S  Jt'D 


with 
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(3)  /,  (P)  =  Max  R  (*)  dGq  (p.  z) 

f  r  .S’ 

Considering  the  unbounded  process,  we  obtain  the  functional  relation 

(4)  /(/>)  =  Max  f  /  (z)  dGq  (p,  z) 

qiS  Jztl> 

§  6.  Mathematical  formulation — III.  A  continuous  determin¬ 
istic  process 

There  are  a  number  of  interesting  processes  that  require  that  decisions 
be  made  at  each  point  of  a  continuum,  such  as  a  time  interval.  The 
simplest  examples  of  processes  of  this  character  are  furnished  by  the 
calculus  of  variations.  As  we  shall  see  in  Chapter  IX  below,  this  conception 
of  the  calculus  of  variations  leads  to  a  new  view  of  various  parts  of  this 
classical  theory. 

Let  us  define 

(1)  f(p',T)  =  the  return  obtained  over  a  time  interval  [0,  T]  starting 

from  the  initial  state  p  and  employing  an  optimal  policy. 

Although  we  consider  the  process  as  one  consisting  of  ch  :ces  made  at 
each  point  t  on  [0,  T],  it  is  better  to  begin  with  the  concept  of  choosing 
policies,  which  is  to  say  functions,  over  intervals,  and  then  pass  to  the 
limit  as  these  intervals  shrink  to  points.  The  analogue  of  (4.3)  is 

(2)  f(P>  S  -f  7  )  =  Max  f  (pi>:  T) 

I)  10,  ,\j 

where  the  maximum  is  taken  over  all  allowable  decisions  made  over  the 
interval  [0,  S]. 

As  soon  as  we  consider  infinite  processes,  occurring  as  the  result  of 
either  unbounded  sequences  of  operations,  or  because  of  choices  made 
over  continua,  we  are  confronted  with  the  difficulty  of  establishing  the 
existence  of  an  actual  maximum  rather  than  a  supremum.  In  general, 
therefore,  in  the  discussion  of  processes  of  continuous  type,  it  is  better  to 
use  initially  the  equation 

(3)  /(/>;S+  T)  =  Sup/ (£„;  T), 

_  -  >> 

which  is  usually  easy  to  establish,  and  then  show,  under  suitable  assump¬ 
tions  that  the  maximum  is  actually  attained. 

As  we  shall  see  in  Chapter  IX,  the  limiting  form  of  (2)  as5->0isanon- 
linear  partial  differential  equation.  This  is  the  important  form  for  actual 
analytic  utilization.  For  numerical  purposes,  S  is  kept  non-zero  hut  small.3 

3  We  shall  show,  in  Chapter  IX,  that  it  is  possible  to  avoid  many  of  the  quite 
difficult  rigorous  details  involved  in  this  limiting  procedure  if  we  are  interested 
only  the  computational  solution  of  variational  processes 
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S  7.  Continuous  stochastic  processes 

An  interesting  and  challenging  question  which  awaits  further  explora¬ 
tion  is  the  formulation  and  solution  of  general  classes  of  continuous  sto¬ 
chastic  decision  processes  of  both  one-person  and  two-person  variety. 
Although  we  shall  discuss  a  particular  process  in  Chapter  VI II,  we  shall  not 
discuss  the  general  formulation  of  continuous  stochastic  decision  proces¬ 
ses  here,  since  a  rigorous  treatment  requires  delicate  and  involved  argu¬ 
mentation  based  tgron  sophisticated  concepts. 

§  8.  Generalizations 

It  will  be  apparent  to  the  reader  that  the  functional  equations  we  have 
derived  above  for  the  case  where  the  state  variables  and  the  decision 
variables  were  constrained  to  finite  dimensional  Euclidean  spaces  can  be 
extended  to  cover  the  case  where  the  state  variables  and  decision  variables 
are  elements  of  more  general  mathematical  spaces,  such  as  Banach 
spaces. 

Rather  than  present  this  extension  abstractly  we  prefer  to  wait  until  a 
second  volume  where  we  will  discuss  examples  oi  these  more  general  pro¬ 
cesses.  The  theory  of  integral  equations  and  variational  problems  invok  ¬ 
ing  functions  of  several  variables,  as  well  as  more  general  stochastic 
processes,  all  afford  examples  of  processes  which  escape  the  finite  dimen¬ 
sional  formulation  to  which  we  have  restricted  ourselves  in  this  volume, 
and  require  for  their  formulation  in  the  foregoing  terms  the  theory  of 
functionals  and  operations. 

$  !>.  Causality  and  optimality 

Consider  a  multi-stage  process  involving  no  decisions,  say  one  generated 
bv  the  system  of  differential  equations, 

(1 )  d.Xitdt  g(  (.v„  a . . A-.v),  .V(  ((»)  Ci,  i  1,2 . .V 

which  may,  more  compactly,  be  written  in  vector  form 

(2)  dxjdt  g  yx),  x  (<>)  c. 

The  state  of  the  system  at  time  I,  taking  for  granted  existence  and  uni¬ 
queness  of  the  solution,  is  a  function  only  of  c  and  /,  thus  we  may  write 

(3J  -v  (/)  f(c,t). 

The  uniqueness  of  the  solution  leads  to  the  functional  equation 
(4)  /{c.s-rt)  f(f{c,s).t). 

for  s,  t  >  0,  an  analvtical  transliteration  of  the  law  of  causality.  Thisequa- 
tion  expresses  the  fundamental  semi-group  property  of  processes  of  tins 
type. 
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Comparing  (4)  above  with  (6.2),  we  see  that  we  may  regard  multi-stage 
decision  processes  as  furnishing  a  natural  extension  of  the  theory  of 
semi-groups.  Any  further  discussion  here  along  these  lines  would  carry 
us  beyond  our  self-imposed  limits,  and  we  shall  consequently  content 
ourselves  with  the  above  observation. 

§  10.  Approximation  in  policy  space 

In  solving  a  functional  equation  such  as  (4.4)  or  (5.3),  we  shall  in  Chapter 
IV  make  use  of  that  general  factotum  of  analysis,  the  method  of  successive 
approximations.  The  method  very  briefly,  consists  of  choosing  an  initial 
function  f0  (p),  and  then  determining  a  sequence  of  functions,  {Jn  (/>)},  by 
means  of  the  algorithm 

(1)  fN  [p)  =  Max/v  -  ,  (Tq  (/>)),  N  =  1,2,  . . . 

« 

as,  for  instance,  in  (4.4)  We  have  already  employed  this  method  in  dea¬ 
ling  with  the  equations  of  Chapters  I  and  II. 

In  many  important  cases,  this  method  after  a  suitable  preliminary 
preparation  of  the  equation  actually  leads  to  a  convergent  sequence 
whose  limit  yields  the  solution  of  the  functional  equation.4  We  shall  make 
extensive  use  of  it  in  the  following  chapter. 

In  the  theory  of  dynamic  programming,  however,  we  have  an  alternate 
method  of  approximation  which  is  equally  important  in  its  own  right,  a 
method  which  we  call  ‘‘approximation  in  policy  space”. 

Before  discussing  this  method  of  approximation,  .  t  us  observe  that 
there  is  a  natural  duality  existing  in  dynamic  programming  processes  be¬ 
tween  the  function  /  (p)  measuring  the  overall  return,  and  the  optimal 
policy  (or  po  icies)  which  yields  this  return.  Each  can  be  used  to  determine 
the  oth.  r,  with  the  additional  feature  that  a  knowledge  of /(/>)  yields  all 
optimal  policies,  since  it  determines  all  maximizing  indices  q  in  an  equa¬ 
tion  such  as  (4.4),  while  a  knowledge  of  any  particular  optimal  policy 
yields  /  (p). 

The  maximizing  index  q  can  be  considered  to  be  a  function  of  p.  If  the 
index  is  not  unique,  we  have  a  multi-valued  function.  Whereas  we  call 
/  {p)  an  element  in  function  space,  let  us  call  q  =  q  (p)  an  element  of 
policy  space.  Both  spaces  arc,  of  course,  function  spaces,  but  it  is  worth 
distinguishing  between  them,  since  their  elements  are  quite  different  in 
meaning. 

It  follows  now  that  we  have  two  ways  of  making  an  initial  approxima- 

♦  It  is  interesting  to  observe  that  in  many  theories,  as,  for  example,  partial 
differential  equations,  the  preliminary  transformation  of  the  equation  is  of  such 
a  nature  that  the  principal  difficulty  of  the  existence  proof  resides  in  the  demon¬ 
stration  that  the  limit  function  actually  satisfies  the  original  equation. 


88 


DYNAMIC  PROGKAMMING  PROCESSES 


tion.  We  may  approximate  to / (p),  as  we  do  ordinarily  in  the  method  of 
successive  approximation,  or  we  may,  and  this  is  a  feature  of  the  func¬ 
tional  equations  belonging  to  dynamic  programming  processes,  approxi¬ 
mate  initially  in  policy  space.' 

Choosing  an  initial  approximation  q0  =  q0  (p),  we  compute  the  return 
from  this  policy  by  means  of  the  functional  equation 

(2)  fo(P)=MTgo(p)). 

We  have  already  given  an  example  of  this  in  §  1 1  of  Chapter  I. 

There  are  now  two  ways  we  can  proceed.  Taking  the  function  of  q, 
fo  (Tq  (/>)),  we  can  determine  a  function  q  (p)  which  maximizes.  Call  this 
function  qt  (p).  Using  this  new  policy,  we  determine /,  (/>),  the  new  return, 
by  means  of  the  functional  equation 

3)  ZAP)  =/.(T„l  (/>)). 

This  equation  is  solved  iteratively,  as  in  (1.3)  and  (1.4)  of  Chapter  I. 
Continuing  in  this  way,  we  obtain  two  sequences  {/n  ( p )}  and  {qs  ( p )}. 
In  place  of  this  procedure,  we  can  define 

(4)  /,(/>)  =  Max/0  (Tg  (p)) , 

i 

and  then  continue  inductively,  employing  the  usual  method  of  successive 
approximations, 

(5)  fs  +  AP)  =  Max  f„{T'  (p)). 

It  is  immediate  that  /,  ;>/„  and  thus  that  the  sequence  {fN}  is  mono¬ 
tone  increasing.  We  shall  discuss  the  convergence  of  this  process  in  the 
next  chapter. 

The  first  procedure,  although  a  more  natural  one,  seems  more  difficult 
to  treat  rigorously  and  we  shall  not  consider  it  here.  In  dealing  with 
various  types  of  continuous  processes,  such  as  those  furnished  by  the 
calculus  of  variations,  it  would  seem,  however,  that  this  technique  is 
required  for  successive  approximations.  We  shall  discuss  this  topic  again 
in  Chapter  iX. 


6  Actually  this  type  of  approximation  is  tacitly  encountered  in  other  branches 
of  analysis  as,  for  instance,  in  the  theory  of  differential  equations,  where  a  differ¬ 
ential  equation  is  frequently  replaced  by  a  difference  equation  for  approximation 
purposes.  This  replaces  the  space  of  general  functions  by  the  subspacc  of  step- 
functions. 
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Exercises  and  Research  Problems  for  Chapter  III 


1.  Supjrosc  that  we  are  given  the  information  that  a  hall  is  in  one  of  N 
boxes,  and  the  a  priori  probability,  pt,  that  it  is  in  the  k'h  box.  Show  that 
the  procedure  which  minimizes  the  expected  time  required  to  find  the 
ball  consists  of  looking  in  the  most  likely  box  first. 


2.  Consider  the  more  general  process  where  the  time  consumed  in  exam¬ 
ining  the  k,h  box  is  /*-,  and  where  there  is  a  probability  qt  that  any 
particular  examination  of  the  &,h  box  will  yield  no  information  concerning 
its  contents.  When  this  happens,  we  continue  the  search  operation  with 
the  information  already  available. 

Let  /(pi,  p . .  ps)  be  the  expected  time  required  to  obtain  the  ball 

using  an  optimal  policy.  Show  that  this  function  satisfies  the  equation 


. -  J,r  + . 0 . H 

where  pi*  =  />/*/(!  —  pk)  and  the  0  occurs  in  the  k'h  place. 


3.  Prove  that  if  we  wish  to  obtain  the  ball,  the  optimal  policy  consists  of 
examining  the  box  for  which  pk  (1  — qk)/tk  is  a  maximum  first.  On  the 
other  hand,  if  we  merely  wish  to  locate  the  box  containing  the  ball  in  the 
minimum  expected  time,  the  box  for  which  this  quantity  is  a  maximum 
is  examined  first,  or  no'  at  all. 


4.  Consider  the  situation  in  which  we  can  simultaneously  perform  oper¬ 
ations  which  locate  the  ball  within  given  sets  of  boxes. 

f>.  We  have  a  number  of  coins,  all  of  the  same  weight  except  for  one 
which  is  of  different  weight,  and  a  balance.  Determine  the  weighing  pro¬ 
cedures  which  minimize  the  maximum  time  required  to  locate  the  dis¬ 
tinctive  coin  in  the  following  cases 

a.  The  coin  is  known  to  be  heavier 

b.  It  is  not  known  whether  the  coin  is  heavier  or  lighter. 

<».  Determine  the  weighing  procedures  which  minimize  the  expected 
time  required  to  locate  the  coin. 

7.  Consider  the  more  genera1  problem  where  there  are  two  or  more  dis¬ 

tinctive  coins,  under  various  assumptions  concerning  the  properties  of  the 
distinctive  coins.  (Cairns) 

8.  We  are  given  n  items,  not  all  identical,  which  must  be  processed 
through  a  number  of  machines,  m,  of  different  type.  The  order  in  which  the 
machines  are  to  be  used  is  not  immaterial,  since  some  processes  must  be 
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carried  out  before  others.  Given  the  times  required  by  the  »,h  item  on  the 
;,h  machine,  an,  i  —  1,2,  1,2,  . ...  m,  we  wish  to  determine 

the  order  in  which  the  items  should  be  fed  into  the  machines  so  as  to 
minimize  the  total  time  required  to  complete  the  lot. 

Consider  the  case  where  there  are  only  two  stages  with  a<,  =  ai  and 
ait  —  bt,  and  where  the  machines  must  be  used  in  this  order.  Let 

/ (alt  bt,  at,  bt,  . . .,  aN,  In',  t)  =  time  consumed  processing  the-  N  items 

with  required  times  at,  bi  on  the  first  and 
second  machines  when  the  second  ma¬ 
chine  is  committed  for  t  hours  ahead,  and 
an  optimal  scheduling  procedure  is  em¬ 
ployed. 

Prove  that  /  satisfies  the  functional  equation 

/(<*!.  bt,  at,  bt . ay,  by\  t)  =  Min  [at  +/(a„  bu  at,b . 0,0 . 

ay,  by ;  bi  -}-  max  (t  —  at,  0)] , 
where  the  (0,  0)  combination  is  in  place  of  (at,  bt). 

9.  Show  that  an  optimal  ordering  is  determined  by  the  following  rule: 
Item  i  precedes  item  /  if  min  (at,  b/)  <  min  (a),  bt).  If  there  is  equality, 
either  ordering  is  optimal,  provided  that  it  is  consistent  with  all  the  defi¬ 
nite  preferences.  (Johnson) 

What  is  the  solution  if  either  machine  can  be  used  first  ? 

10.  Let  Xt  be  the  inactive  time  in  the  second  machine  immediately  before 
the  <,h  item  is  processed  on  the  second  machine.  Let  at,  bt  be  the  times 
required  to  process  the  »,h  item  on  the  first  and  second  machines  respec¬ 
tively  and  assume  that  the  items  are  arranged  in  numerical  order.  Then 

H  U  il  1 

1'  xt  =  Max  [2’  at —  2.’  bi]  (Johnson) 

i  I  l  n  <  n  i  1  i  1 

11.  For  the  three-stage  process  the  corresponding  expression  for  the  total 
idle  time  on  the  third  machine  is 

u  h  I  r  r  I 

Max  2’  rti  —  2’  bt  -f-  2’  In  —  2’  c<]  (Johnson) 

!<«*»■'/#*  1  i  \  i  1  i  1 

12.  Consider  the  following  problem  arising  in  the  production  of  many- 
part  items,  or  alternately  in  the  maintenance  of  a  complex  system.  There 
are  N  different  stages  of  production  involved  in  turning  out  the  final  item. 
The  probability  that  the  item  is  processed  correctly  at  the  i"1  stage  is  pi. 
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Assume  that  k  machines  are  available  which  can  be  used  to  increase  the 
accuracy  of  any  particular  stage  of  the  process  in  the  following  way.  If 
one  machine  is  added  to  the  *,h  stage,  pi  becomes  /><,,  if  two  machines, 
then  pit,  and  so  on. 

How  should  we  distribute  the  machines  to  maximize  the  overall  accu¬ 
racy  of  the  process  ?  Consider  the  same  problem  under  the  following  alter¬ 
native  assumptions. 

(a)  At  most  machines  are  allowed  at  the  »,h  stage 

(b)  A  machine  at  the  »‘h  stage  costs  di  dollars  and  we  have  at  most  d 
dollars  to  spend. 

(c)  A  machine  at  the  t',h  stage  requires  hi  operators  at  the  *th  point,  and 
at  most  h  men  are  available. 

13.  A  mistake  found  at  the  j,h  point  requires  a  time  tt  and  a  cost  ci  to 
rectify.  Taking  into  account  laboring  costs,  machine  costs,  and  the  cost  of 
turning  out  a  defective  item,  say  z,  how  much  money  should  be  spent  on 
checking  equipment  and  how  should  it  be  used  ? 

14.  Consider  the  problem  of  maximizing  the  function 

n 

2T  cpi  ( xi )  under  the  constraints 
i  -  1 

a.  A  |  ;>  0 

b.  1'  Xi  -  r 

«  -  i 

c.  xik  Xik  +  (  =0  for  a  set  of  integers  i,  <*',<»,<  ...  <  im, 

m  <^n  —  1. 

Consider,  in  particular  the  cases 

a.  xi  xt  + 1=0,  i  =  1 ,  2,  . . . ,  n  —  1 

b.  Xi  xt  i  i  xt  +2  =  0,  t'=l,2,...,tt  —  2 

Consider  the  reverse  situation,  where  we  have  constraints  of  the  form 
a.  Xik  xik  +1  2>  d/c. 

Discuss  the  special  cases 

a.  Xi  Xi  + 1  ;>  1 , 

b.  ri  +  1. 

15.  A  manager  of  a  restaurant  has  two  types  of  laundry  service  available 
for  napkins,  a  quick  service  which  requires  q  days,  and  costs  c  cents  per 
napkin,  and  a  slow  service  which  requires  p  >  q  days  and  costs  d  cents, 
d  <  c,  per  napkin.  Assuming  that  he  knows  in  advance  the  number  of 
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customers  he  will  have  on  any  given  day  of  an  AT-day  period  and  that 
he  prides  himself  on  providing  every  customer  with  a  napkin,  how  many 
napkins  should  he  purchase  and  how  should  he  launder  them  so  as  to 
minimize  the  total  cost  over  the  N- day  period  ?  Consider  first  the  cases 
where  p  =  q  +  1 ,  and  p  =  q  -f  2. 

16.  Consider  the  analogous  problem  under  the  assumption  that  k  laun¬ 
derings  wear  out  a  napkin. 

17.  Consider  the  above  problem  under  the  assumption  that  number  of 
customers  on  each  day  is  a  stochastic  quantity. 

18.  We  have  a  resource  x  which  may  be  utilized  in  a  number  of  ways.  If 
y  is  a  parameter  specifying  a  particular  use,  let  R  ( x ,  y)  be  the  immediate 
return,  and  D  (x,  y)  the  cost  in  resources.  If  / (x)  is  the  total  return  from 
repeated  use  of  an  initial  resource  x,  obtained  using  an  optimal  allocation 
policy,  we  derive  the  functional  equation 

/(*)  =  Max  [/?  (x,  y)  +  f(x  —  D  (x,  y))] . 
v 

Assuming  that  D  (x,  y)  is  small  compared  to  x,  for  all  y,  show  that  we 
obtain  the  formal  approximate  equation 

R  (x,  y) 

and  give  the  interpretation  of  this  result. 

19.  Consider  the  stochastic  case.  Show  that  the  corresponding  functional 
equation  has  the  form 


zdR  (y,  r,  x)  -f 


wdD  (y,  io,  x))] , 


and  the  approximate  equation  has  the  form 


/  (*)  —  Max 

'  /. 


zdR  (y,  z,  x) 
wdl)  (y,  w,  x) 


and  give  the  interpretation  of  the  result. 

20.  Consider  the  application  of  approximation  in  policy  space  to  the 
functional  equation 

f(x)  =  Max  [g(y)  +  h  {x  —  y)  +f(ay  +  b  {x  —  y))]. 

o  <  y  <  I 

We  choose  an  initial  y„  (x)  and  compute  /„  (x).  Then  determine  y,  (x)  by 
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the  condition  that  y,  maximize  the  function  g  (y)  4-  h  (x — y)  f 
fu  (ay  -f-  b  (x  —  y)),  compute /,  (x)  using  y,  (x),  and  so  on.  When  are  the 
elements  of  the  sequences  {yn  (x)}  and  {/„  (x)}  continuous  in  x,  and  when 
do  they  converge?  Consider,  in  particular,  the  cases  where  g  and  h  are 
both  convex,  or  both  concave. 


21.  Assume  that  we  have  two  machines,  unimaginatively  called  I  and  II, 
with  the  following  properties.  If  machine  I  is  used  there  is  a  probability  r 
of  receiving  a  gain  of  one  unit ;  if  machine  II  is  used,  there  is  a  probability 
s  of  receiving  a  gain  of  one  unit.  We  shall  assume  that  s  is  known,  but  that 
r  is  determined  only  by  an  a  priori  probability  distribution.  The  problem 
is  to  determined  selection  policy  which  maximizes  the  expected  return 
obtained  over  .V  trials,  or  alternatively  the  discounted  return  from  an 
unbounded  process,  discounting  the  return  one  stage  hence  by  a  factor 
a  <  1. 

Assume  that  the  distribution  function  for  r  after  m  successes  and  n 
failures  on  the  first  machine  is  given  by 

...  .  .  r"‘ (\  —  r)”  dF  (r) 

dr  m.n  \r)  — 

rm  (1 — r)n  dF  (r) 


Let  fm.  n  equal  the  expected  return  obtained  using  an  optimal  policy 
for  an  unbounded  process  after  the  first  machine  has  had  in  successes 
and  n  failures.  Show  that  n  satisfies  the  recurrence  relation 

1:  ^  rdFm.  „  (r)  [I  «fm +\.  s] 

f"'-  u  Max  +[,’(•  r)  <iF . »{r)  1], 

II:  x  ( I  —  </) 


22.  Prove  that  there  is  a  unique  bounded  solution  to  this  equation,  which 
mav  be  obtained  by  successive  approximations. 


2.1.  Provi  that  for  each  in.  n  0  there  is  a  unique  quantity  s  (m,  n)  with 
the  property  that  the  sequence  {/,»«}  is  determined  by  the  equations 

(a)  n  (I  ti) ,  1  >  x  s  (mi,  n)  , 

Jmn  I  rdF  VI  V  ( f  )  I  J  afm  1.  ri] 


+  a  (1 


ri 

rdi  „  (r))  „  +  i,  0  <;  s  <  s  (in,  it). 

J  « 
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The  sc<]uence  s  (m,  n)  has  t lie  following  properties 

(b)  s  (m  -f  1,  m)  >  .v  (m.  it)  >  s  (tn,  tt  -f  1) , 


and 


(c)  f  m  +  1,  n  >  f  m ,  n  f  mt  n  +  \  • 

How  can  the  sequence  s  (m,  n)  be  calculated  ? 

24.  Prove  the  corresponding  results  for  the  process  allowing  only  a  finite 
number  of  trials. 


25.  Consider  the  following  situation.  We  have" a  warehouse  with  fixed 
capacity  and  an  initial  stock  of  a  certain  product  which  is  subject  to 
known  seasonal  price  and  cost  variations.  The  problem  is  to  determine 
the  optimal  pattern  of  purchasing  (or  production),  storage  and  sales. 

Let  B  denote  the  fixed  warehouse  capacity,  and  A  the  initial  stock  in 
the  warehouse.  Consider  a  seasonal  product  bought  (or  produced)  and 
sold  for  each  of  i  =  1,2,  . . .,  n  periods.  For  the  »,h  period,  let 


(1)  ci  =  cost  per  unit 

pi  =  selling  price  per  unit 

Xt  =  amount  bought  (or  produced) 

yi  =  amount  sold 


The  constraints  are  as  follows: 

(2)  (a)  Buying  Constraints:  The  stock  on  hand  at  the  end  of  the 

i,h  period  cannot  exceed  the  warehouse  capacity. 

(b)  Selling  Constraints:  The  amount  sold  in  the  tth  period 
cannot  exceed  the  amount  available  at  the  end  of  the 
(i  —  I)"*  period. 

(c)  Non-negativity:  Amounts  purchased  or  sold  in  any 
period  are  non-negative. 

The  problem  is  to  determine  the  policy  which  maximizes  the  over-all 
profit. 

Show  that  it  may  be  converted  into  the  problem  of  determining  the  Xi 
and  y<  which  maximize 

(3)  P  i'  ( P,  yj  —  c,  x, ) , 

i  i 

subject  to  the  constraints 

I 

(4)  (a)  A  +  1'  (xi  —  yd  <;  B,  i=l,2,...,;/, 

i  i 

I  1 

(b)  >’i  ^  A  +  1’  (xj  —  y,),  i  =  1,2,  ...,;/, 

;  i 

(c)  xi,  yi  ^  0. 
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26.  For  fixed  B,  define 

fH  (A)  =  Max  P,  n  =  1,  2, _ 

Show  that  {A)  =  px  A,  and  that 

(1)  fn(A)  =  Max  [/>,  Vi  —  c1xl+  fn  -  \(A  -f  x,  —  y*)L 

*u¥t 

for  n  ;>  2,  where  the  maximum  is  over  the  region 
<2)  (a)  0  ^  y,  A 

(b)  xl  —  yl<,B  —  A, 

27.  Prove  that  the  function  fy  ( v )  is  linear  in  v,  namely 

fx  ( v )  =  Ky  {pi,  pi,  ....  px,  Cl,  C2,  .  . Cy)  4- 

Z.N  (/>i,  p2 . px.  Cl,  C2.  ....  Cat)  t\ 

and  thus  that  the  optimal  policy  is  independent  of  v. 

(Dreyfus) 


28.  Consider  the  following  idealized  transportation  system 


k  1  k  =  2  /t  =  3 

7T  o-  - ►O  0 

\  / 

St  0  - _ »  0  0 


k  =  N 
0 


\ 

/* 


.F 


At  each  stage  we  have  two  terminals  Tk  and  Sk.  From  either  T*  or  5*. 
we  can  ship  materials  to  T k  + 1  or  S*-+i. 

The  maximum  amounts  we  can  ship  along  these  routes  are  the  following 


a. 

Tk-> 

Tk  +  i  =  Rk 

V  +  1, 

Ty->F  =  Ry 

b. 

T  k  —*■ 

Sk  + 1  =  Rk, 

A*  +  1 

Sy  ->  F  =  Sx 

c. 

sk~> 

Sk  +  i  —  S  k. 

k  +  1 

d. 

Sk -> 

F  k  +  i  =  Sk. 

A-  +1 

Starting  with  initial  quantitites  x  at  Tk  and  y  at  Sk,  denote  by  Fk  (x,  y) 
the  quantity  arriving  at  F  using  an  optimal  shipping  policy.  Show  that 

Fy  (x,  y)  —  Min  ( x ,  Ry)  +  Min  (y.  Sy) , 

Fk  { x ,  y)  =  Max  Fk  + 1  (z,  +  w2,  z,  -f  wq) , 
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where  the  maximum  is  taken  over  the  region 

2t  +  ■*»  ^  x,  ip,  +  u»,  ^  y, 

0  ^  2l  ^  k  +  i ,  0  I,  ^  /?*,  jt  +  ,  j 

o  ^  v>i  <.  St,  k  +  , ,  o  <;  ip,  <;  s*.  *  + , 

29.  Formulate  the  corresponding  problem  for  the  case  where  the  ter¬ 
minals  have  maximum  capacities. 

30.  Consider  the  stochastic  case  where  the  capacities  are  random  vari¬ 
ables  with  known  distribution  functions.  Obtain  a  recurrence  relation  for 
the  maximum  expected  quantity  arriving  at  F,  under  various  assump¬ 
tions  concerning  the  information  pattern. 

31.  Consider  the  following  transportation  problem.  We  are  given  a  number 

of  “sources",  St,  S . Sm,  and  a  number  of  "sinks”  or  "terminals”, 

7',,  7',,  ....  Tn.  Each  source  S<  has  a  quantity  xi  of  resources  which  must 
be  transported  to  various  terminals  in  such  a  way  that  the  total  quantity 

arriving  at  T)  fulfills  an  a  priori  demand  y >.  It  is  assumed  that  £  xi  = 

» 

£  yj.  Given  the  distances,  d(],  between  the  sources  and  the  terminals,  and 
i 

assuming  that  the  cost  of  shipping  a  unit  quantity  of  resources  between 
Si  and  Tj  is  equal  to  dij,  we  wish  to  determine  the  routing  which  mini¬ 
mizes  the  total  cost  of  supplying  the  demands. 

Show  that  the  problem  above  is  equivalent  to  minimizing  the  sum 

C  =  £  d„  x„ 
i 

subject  to  the  constraints 

£  Xu  —  xt,  £  xt)  —  >’/,  xn  0.  (Hitchcock-Koopmans) 
i  > 

32.  W7rite,  for  fixed  y,,  y„  ....  y.v, 

Min  C  =  fs  (*,,  -v2,  ....  xm). 
r  a 

Show  that 

/,  (*„  x . x.m)  =  rfi.v  .v,  4  dzs  xt  4-  •  •  •  4-  d.-.ts  xm, 

m 

fs  (v„  xt . xm)  =  Min  [  £  dtl  xtl  4-  fs  -  l  (*,  —  xt  —  x,„ 

'' 

....  x.m  —  *.«,)] 

where  the  minimum  is  over  the  region 

M 

£  xtl  =  y„  0<S  x Xi. 
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.1/  X 

33.  Show  that  as  a  consequence  of  the  relation  L  Xi  =  27  yj,  we  may 

i-i  /  - i 

always  reduce  the  dimension  of  the  problem  by  one,  by  writing 
/v  (x„  x, . x.\,)  =  JN  (x„  x„  ....  xM  -  i) . 

34.  Consider  the  stochastic  case  where  the  dtj  represent  random  variables 
with  given  distributions. 

35.  Assuming  that  the  cost  of  transportation  from  an  t-port  to  a  /-port  is 
quadratically  nonlinear,  do  xt)  +  eu  xn*,  etj  >  0,  show  that  there  is  now 
a  unique  minimizing  schedule.  (Prager) 

36.  Consider  a  similar  multi-stage  process  where  resources  at  (At,  Bt,  Ct) 
must  be  transported  to  (At  +  l,  Bt  +  ,,  C«  +  ,)  and  so  on,  until  reaching 
assigned  destinations,  Tu  Tt,  T 3,  as  indicated  below 

A ,  A ,  A.\  T j 

...  Bk  T , 

c,  '  c,  c.\  r. 


37.  Consider  the  problem  of  determining  the  minimum  of 

.v 

L  (x)  =  27  ct  xt, 

i  -  1 

subject  to  the  constraints 

.v 

27  at)  xj  <,  bt,i  =-  1 ,2, .  . . ,  M , 

i  i 

xt  >  0, 

where  we  assume  that  a()  7>  0. 

Denote  Min  L  (x)  by  fs  (blt  b3,  . . .,  bu).  Show  that 

X 

fs  (bt,  b . b.u)  =  Min  [c.v  x.v  -f  f.x  -  i  (b3  —  ui.v  x.v,  bt  —  «2.v  x*,  . . ., 

*x 

b.u  —  a. us  x.v)], 

where  x.v  is  constrained  by  the  relations 

x.v  >  0,  x.v  <,  Min  (btjatx). 
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38.  Suppose  that  we  have  an  empty  five  gallon  jug.  /,,  an  empty  two- 
gallon  jug,  /,,  and  unlimited  supplies  of  usquebaugh  and  water.  The 
allowable  operations  are 

At  Fill/, 

At  Empty  /,  of  any  contents 
A,  Fill  /, 

A 4  Empty  /,  of  any  contents 

As  Pour  contents  of  /,  into  /,.  as  much  as  allowable 
At  Pour  contents  of  /,  into  /,.  as  much  as  allowable. 

After  any  finite  number  of  operations,  the  state  of  the  system  may  be 
described  as  follows : 


1.  There  are  i  =  0,  1,  2  gallons  of  liquid  in  /„  with  a 
ratio  r:  (1  —  r)  of  usquebaugh  to  water. 

2.  There  are  /  =  0,  1,  2,  3,  4,  5  gallons  in  /,  with  a 
ratio  s:  (1  —  s)  of  usquebaugh  to  water. 

Starting  in  some  initial  state  (»,  /;  r,  s),  let /(*,  /;  r,  s)  denote  the  mi¬ 
nimum  number  of  operations  required  to  attain  a  given  state,  say  a  fifty- 
fifty  mixture  of  water  and  usquebaugh  in  /,. 

Show  that 

/ (*,  i ;  r,  s)  —  1  +  Min  Akf. 

i  <  t  <  « 

Is /(»,  /;  r,  s)  finite  for  all  rational  r,  with  j  =  0,  and  all  rational  s, 
with  i  =  0?  If  not,  what  final  combinations  of  water  and  usquebaugh 
can  be  attained  in  /,  in  a  finite  number  of  operations? 


39.  Consider  the  following  problem:  At  each  stage  of  sequence  of  actions 
we  are  allowed  our  choice  of  one  of  two  actions.  The  first  has  associated  a 
probability  of  gaining  one  unit,  a  probability  />,  of  gaining  two  units, 
and  a  probability  pn  of  terminating  the  process.  The  second  has  a  similar 
set  of  probabilities  p3'.  What  sequence  of  choices  maximizes  the 

probability  of  attaining  at  least  w  units  before  the  process  is  terminated  ? 

Let  m  («)  be  the  maximum  probability.  Then 


n  (n)  —  Max 


>i  «  (” 

-Pi  «  («  — 


1)  +  P i  «  («  —  2)  1 

1)  +  Pi  M  (>l - 2)J  ’  ” 


H  («)  =  1, 


n  <  0. 


40  Prove  that  if 

u  («)  =  Max  —  itij  u  (n  —  /)  I  ,  n  ^  R , 

i  <  ;  '  k  L  i  i  J 

u  (/)  >  0,  /  =  0,  1 ,  2 . K  —  1  , 
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and  if 

(a)  at,  ^  0 

R 

(b)  there  is  one  equation,  rR  =  E  at)  rR  -  >  whose  largest 

/  -  i 

positive  root  is  greater  than  the  corresponding  roots  of 
the  other  equations  of  this  type; 

(c)  for  this  index  k,  a*,  #  0, 

under  these  circumstances,  the  solution  of  (1)  is  given  by 

R 

u  (ft)  —  E  at i  u  (n  —  j) 
i-  i 

for  n  sufficiently  large. 

What  happens  if  at  least  two  characteristic  equations  have  the  same 
maximum  root  ? 

41.  Consider  the  equation 

"  R 

u  (m)  =  Max  E  a()  u  (n  — ;')  -f-  gt 
1  <  i  <  m  1/  -  1 
R 

where  ;>  0,  E  a,f  =  1,  g(  ;>  0,  u  (l)  ^  0,  /  =  0,  1,  2 . R  —  1  . 

-  i 
n 

Let  c  =  Max  gt/  E  jatt  be  attained  for  the  single  value 

i  j  -  1 

i  —  s.  If  a ,,  >  0,  the  solution  is  given  by 

R 

u  (n)  =  E  a.j  u  («  —  j)  -f  g. 
i  i 

for  n  ;>  ti0  where  n„  depends  upon  the  initial  conditions  and  coefficients. 

42.  Is  the  result  true  if  <jn  =  0?  Construct  a  counter-example. 

43.  Given  a  finite  set  {Ai}  of  non-negative  square  matrices,  let  C.v  be  the 

matrix  ...  Bn,  where  each  Bi  is  an  A],  which  possesses  the  charac¬ 

teristic  root  of  largest  absolute  value.  Let  r4\  be  this  root.  Prove  that  /x 
=  lim  tn1,n  exist.  Let  Ms  denote  the  smallest  majorant  of  the  products 

y  oo 

Pa  =  Bi  Bj  ...  Bn',  i.e.,  the  i;lh  element  in  M\  is  greater  than  or  equal 
the  ijlh  element  in  .any  Py.  Let  my  be  the  characteristic  root  of  My  of 
largest  absolute  value.  Prove  that  A  =  lim  A/  v*/.v  exists  as  N  ->  oo. 

y  — *  oo 


44.  Prove  or  disprove  that  /x  /.. 
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45.  Consider  the  following  problem:  We  are  given  initially  x  dollars  and 
a  quantity  y  of  a  serum,  together  with  the  prerogative  of  purchasing 

additional  amounts  of  the  serum  at  specified  times  /,</,< _ At  the 

£,h  purchasing  opportunity,  /*,  a  quantity  ckz  of  serum  may  be  purchased 
for  2  dollars,  where  ck  is  a  monotone-increasing  function  of  k.  Given  the 
probability  that  an  epidemic  occurs  between  and  tk  + 1,  and  the  condi¬ 
tion  that  if  an  epidemic  occurs  we  may  only  use  the  amount  of  serum  on 
hand,  the  problem  is  to  determine  the  purchasing  policy  that  maximizes 
the  over-all  probability  of  successfully  combating  an  epidemic,  given  the 
probability  of  success  with  a  quantity  w  of  serum  available. 

The  condition  ck  >  ck  -  »  is  imposed  to  indicate  the  cheaper  cost  of 
serum  at  a  later  date  because  of  technological  improvement.  Let 

pk  =  probability  that  the  epidemic  occurs  between  tk  and 
Ik  + ,,  assuming  that  it  has  not  occurred  previously, 

<p  (w)  =  probability  of  combating  the  epidemic  successfully  with 
a  quantity  w  of  serum, 

/*  (*,  v)  =  over-all  probability  of  success  using  an  optimal  pur¬ 
chasing  policy  from  on,  given  x  dollars  and  a  quantity 
y  of  serum  on  hand. 

Show  that  fk  (x,  y)  satisfies  the  functional  equation 

/*(*,  y)  =  Max  [pk<p  (y  +  ckz)  +  (1  —  £*)/*  +  >  [x  —  z,  y  +  ckz)] 

46.  Show  that  if  (it)  is  convex  for  all  values  of  w  which  occur,  the  opti¬ 
mal  policy  consists  of  purchasing  no  serum  at  /„  . . .,  tk  ~  ,  and  then 
using  all  available  money  at  tk  where  k  is  chosen  so  as  to  maximize 

[1  —  (I  —  p)k  -  ']  <p  (y)  +  (1  —  p)k  ~  1  (p  (y  -f  ckx), 
if  pk  p.  Find  the  corresponding  expression  for  general  pk. 


47.  Let 


Show  that 


F,v  (/)  =  Min 

W) 


.V 

—  w  akqpk  I  ndx. 

k  -  1 


Fa  (/)  =  Min  Fa  -  l  (/ —  «a  (f  a) 

".v 

48.  Show  that  if  we  let 

tn  (x,,  x2,  ...,x.v)  the  minimum  of  N  quantities, 
x2,  . . .,  x.\,  we  have  the  functional  equation 
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m  (nt  ( x, .  x . .  -  1).  xN)  =  m  (r,,  x,,  . . .  ,**). 

and  similarly  for  A/  (xIt  x . .  xs),  the  maximum  of  N  quantities. 


49.  Show  that 

Max  [(1  —  x,)  e1'  -f-  (1  —  x2)  er'  +  *'  +  .  • .  4*  (1  —  *jv)  ez*  +  **  +  ••  •  +  *.v]  =  e.v, 
W 

where  =  £ ,  e.v  =  e'v  ~  1 . 

50.  Set 

/*  (6,  k)  =  Min  O  - *>'  —  v  <x  -  <.<>•  j  rfK  (x) . 


Show  that 


/.v  (f»,  ft)  =  Min 
«.v 


(kb'  *■  a  s  ') 


/.V- 


'( 


ftft  4-  ay 

-r-rr*A  +  1 


/,  (ft,  ft)  =  Min  f  [e  -*  <'  *>‘  -<'  -«)’]  K  (x) . 

a  J  i 


51.  Obtain  recurrence  relations  for  the  problem  of  determining  the  mini¬ 
mum  and  maximum  of 

(a)  Q.\  =  (ax,)*  -f  (x,  +  «x2)*  (x,  -f  x,  +  ...  +  x,v-i  +  ax *)*, 

subject  to  x,2  +  x2*  -{-  ...  -f  x.v2  =  1 , 

(b)  Qy-  =  X,5  -f-  (x,  +  rt.Yj)*  -f  . . .  (x,  -f  <?x2  +  a- x,  +  ...  +  «v  - 1  x.v)*, 
subject  to  x,*  -r  x2*  -f-  ...  4-  x.v*  =  1, 

(c)  Qx  —  x,*  +  (  vi  4  u.Yj) 2  4-  (x,  4  ax 2  (a  4-  ft)  x3)*  4*  •  •  • 

(x,  4-  «x2  4-  (a  4-  ft)  xj  4-  . . .  (a  4-  (A7  —  2)  ft)  x.v)*, 

subject  to  x,2  4-  x2*  4-  . . .  4-  x.v*  =  1. 

52.  Suppose  that  a  piece  of  candy  is  to  be  shared  by  two  children.  Show 
that  an  optimal  procedure  is  to  let  one  child  divide  the  candy,  and  the 
other  choose  the  piece  he  wants.  Show  that  this  leads  to  the  equation 

r  =  Max  Min  (y,  x  —  y)  =  x/2, 

O  <  y  ■  J- 

for  the  share  of  the  first  child. 
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53.  What  is  the  corresponding  procedure  for  .V  children  ?  (Steinhaus) 

54.  Suppose  that  we  have  a  vehicle  which  can  carry  enough  gasoline  to 
go  a  distance  of  d  miles.  In  order  to  traverse  a  distance  of  2 d  miles  over 
barren  territory,  it  is  necessary  to  establish  intermediate  caches  of  gaso¬ 
line.  How  should  these  be  located  so  as  to  minimize  the  total  expenditure 
of  gasoline  required  to  traverse  this  distance,  and  what  is  the  total  dis¬ 
tance  travelled  by  the  vehicle  before  it  reaches  its  destination  3 

(N  J.  Fine,  "The  Jeep  Problem,”  .-inter  Math.  Monthly,  Vol.  I. IV,  Jan 
1 047) 

55.  Consider  the  following  more  realistic  versions: 

a.  Use  of  more  than  one  vehicle 

b.  Transportation  of  an  additional  cargo 

c.  Use  of  some  fixed  caches,  established  in  advance 

d.  Delivery  to  more  than  one  destination 

e.  Establishment  of  a  rate  of  delivery 

f.  Minimization  of  total  cost,  including  cost  of  gasoline, 
cost  of  purchasing  vehicles,  cost  of  establishing  caches. 

g.  Arbitrary  distance  .v  >  ‘Id.  (Helmer) 

56.  Prove  that,  in  general,  the  problem  of  determining 

A  V 

Max  Min  1'  /•'»•  (va,  )  a)  ,  where  1’  xk  <,  x,  1'  y*  <J\  xk,  yk  >  0 
Pa}  M  <  ■  <  I  ‘  I 

cannot  be  reduced  to  a  recurrence  relation  of  the  form 

fs  (a,  v)  Max  Min  /•'.%•  (a.v,  y.v)  +  f\  -  i  (x  —  .v.v,  y  —  J'.v)] . 

J\  v.v 

57.  Suppose  that  the  reqtiirements  of  a  system  at  time  n  arc  r„.  Let  xn  be 
the  actual  level,  and  let  it  be  required  to  have  .v«  i>  r„  for  all  n. 

Furthermore,  the  restriction  on  the  level  at  any  time  is 

A  n  i  |  -A«  A  ( 1  n  An  j ) ,  II  S  1  , 

an  ■‘expansion-limitation'’. 

We  wish  to  chose  the  v,  so  as  to  minimize 
-  *-*  *  -  -  .v 

J  ({  A  })  “  (An  C„)  . 

a  I 

Show  that  the  a,  are  given  by 

v.  v , 

A 2  A  i  Mill  /  7  I ,  ?.  r/z  Min  A  A,,  f/>2] 

a3  v2  Min  >f !,  A  <i  i.  7’3]  Min  A  (.v2  —  a,'  t'j] 

An  An  -  ,  Mill  A"  1  '/  „  A"  2  (f  = 

Mill  A  (An  ,  An  2),  (f  n]. 
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where 

<pt  —  Max 


58.  Determine  the  maximum  of  £  at  xt  subject  to  the  constraints 

s  i  -  1 

£  xt *  =  J ,  0  x,  <,  xx  <;  ...  <.  xn,  where  the  at  are  non-negative. 

»  -  i 

x 

59.  Consider  the  problem  of  determining  the  maximum  of  77  (*<  —  a) 

i  -  1 

.V 

subject  to  the  restrictions  0  <.  xt  <;  b,  where  b  >  a,  and  £  xt  —  c. 

»  -  i 

Show  that  to  obtain  a  functional  equation  we  must  consider  also  the 

.v 

'•roblem  of  determining  the  minimum  of  77  (xt  —  a),  and  obtain  the 

i  -  1 

functional  equations  governing  the  problem. 

Show  that  this  problem  does  not  arise  if  we  consider 

x 

77  |  xt  —  a  | . 

i  -  I 

60.  Assume  that  we  are  a  contestant  on  a  quiz  program  where  we  have 
an  opportunity  to  win  a  substantial  amount  of  money  provided  that  we 
answer  a  series  of  questions  correctly. 

Let  rt  be  the  amount  of  money  obtained  if  the  klh  question  is  answered 
correctly,  and  let  pic  be  the  a  priori  probability  that  we  can  answer  the 

A,h  question  where  k  =  1,  2 . N.  Let  <p(x)  be  the  utility  function 

measuring  the  value  to  us  of  winning  an  amount  x. 

Assume  that  we  have  a  choice  at  the  end  of  each  question  of  attempting 
to  answer  the  next  question,  or  of  stopping  with  the  amount  already  won. 
Determine  the  optimal  policies  to  pursue  under  the  following  conditions: 

a.  Any  wrong  answer  terminates  the  process  with  a  total  return  of 
zero. 

b.  A  total  of  two  wrong  answers  is  allowed. 

i 

O 

c.  Having  answered  k0  questions  correctly,  we  must  win  at  least  £  t*, 

no  mattc»  what  happens  subsequently.  *  “  1 

d.  We  are  competing  with  other  contestants.  The  contestant  obtaining 

the  largest  total  has  an  opportunity  to  answer  a  “jackpot  question’’ 

.v 

worth  much  more  than  £  rk. 

k  -  1 


£  X'  -* 

i  -  k 


(Shepherd). 
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e.  At  each  stage  of  the  process,  we  have  a  choice  of  a  hard  question  or 
an  easy  question  with  the  proviso  that  a  miss  on  an  easy  question 
terminates  the  process  with  a  return  of  zero,  and  a  miss  on  a  hard 
question  terminates  the  process  with  a  total  return  of  one-half  the 
amount  won  to  date. 

61.  Let  the  quantities  bt,  at]  be  stochastic  variables,  subject  to  known 
distributions.  Obtain  a  recurrence  relation  for  the  sequence 

{/.V  (t,  Ci,  C,,  ....  Cm)} 

defined  by  the  equation 

/at  ( t ,  Ci,  c, . Cm)  =  Min  Exp  c*  Z  b( xt  , 

Ji  L  i  _ 1  J 

where  the  x<  satisfy  constraints  of  the  form 

a.  xt  ;>  0, 

.v 

b.  Z  at}xu<,  ct,  t  =  1,  2,  . . .,  nt, 

j  -  i 

and  for  the  sequence 

gs  ( t ,  cu  cx,  ....  Cm)  =  Min  Exp  Z  bt  xt  . 

rt  Li  -  1  J 

In  both  cases,  Exp  represents  the  expected  value  with  respect  to  the 
random  elements. 

62.  Consider  the  Selberg  form  _ 

Qn{x)  =  Z  (  Z  xky, 

n  <  .V  k  |  n 

where  xx  =  1  and  the  other  xk  are  as  yet  undetermined.  The  notation 
Z  xk  means  that  the  sum  is  to  be  taken  over  all  integers  k  which  divide 

k  |  m 

n,  e.g.  Z  xk  =  Xi  -j-  x,  +  Xj  +  x,.  With  the  introduction  of  suitable 
i- 1  o 

state  variables,  determine  recurrence  relations  for  Min  Qn  (x). 

63.  The  problem  of  determining  the  minimum  and  maximum  charac¬ 
teristic  roots  of  the  Jacobi  matrix 
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where  the  dots  signify  that  all  the  other  elements  are  zero,  is  equivalent 
to  determining  the  minimum  and  maximum  values  of  the  quadratic  form, 

.v  .v  —  i 

Q.x  ( x )  =  £  at  +  2  H  bt  xt  Xi  + 1 , 

i  -  i  i-i 

x 

on  the  sphere  JL'  xt2  =  1. 

i  - 1 

Consider  the  two  secjuences 

fs  (c)  =  Max  [@,v  (x)  +  2cx,v] , 

.V 

gs  (c)  =  Min  [<?.V  (x)  +  2cxjv]  , 

.S’ 

where  S  represents  the  N-dimensional  sphere.  Show  that  recurrence 
relations  may  be  obtained,  connecting  f\  (c)  with  f\  _  i  (c),  and  gx  ( c ) 
with  gx  -  i  (c). 

64.  Obtain  analogous  results  for  the  quadratic  form 

.v  x  —  1  x  2 

Qx  (x)  =  1'  at  xs*  +  2  bt  x(  xi  + 1  +  2  1'  d  xt  xt  +2. 

■  i  i  i  I  l 

65.  Let  A  =  (at))  be  a  positive  definite  symmetric  matrix.  Show  that  the 
problem  of  solving  the  system  of  linear  equations 

x 

1'  at)  X)  =  ci,  i  =  1,2,  . .  . ,  .V, 

>  i 

is  equivalent  to  determining  the  absolute  minimum  of  the  form 

.v  v 

Qx  (a)  —  an  xt  xj  2  1  ci  xi 

>,  i  1  <  1 
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66.  Define  this  minimum  to  be  fs  (cuc . .  cv),  and  obtain  a  recur¬ 

rence  relation  connecting  fs  and  f s  -  1. 

Show  that  fs  is  a  quadratic  form  in  the  variables  c», 

.v 

fs  (cu  Cf,  ....  Cat)  =  £  6<-vb/  ct  cj , 
f  -  • 

and  show  how  the  recurrence  relation  connecting  fs  and  fs  -  l  may  be 

i  <Nh 

utilized  to  obtain  recurrence  relations  for  the  sequences  |6  J . 

67.  A  television  broadcasting  company  wishes  to  lease  video  links  so  that 
certain  of  its  stations  may  be  formed  into  a  connected  network.  Video 
links  exist  between  all  pairs  of  stations,  and  the  costs,  in  general  different, 
of  links  between  the  various  pairs  of  stations  are  known.  Show  that  to 
construct  a  network  at  minimal  cost,  we  choose  among  the  links  not  yet 
included  in  the  network  the  lowest  price  link  which  does  not  form  any 
loop  with  the  links  already  chosen.  (Kalaba) 

n 

68.  Consider  the  problem  of  minimizing  £  q>j  (xj)  over  all  n — tuples  of 

i  1  « 

non-negative  integers  x  —  (x,,  xt,  .  . .,  x„)  which  satisfy  £  xj  =  m, 

i  -  i 

where  <p j,  <pt,  . .  <pn  are  convex  functions  for  xt  >  0.  Let  I  —  {1,  2,  . . ., 

n}  and  for  any  admissible  set,  {x,,  x . x„},  let  S  +  (x)  denote  the  set  of 

indices  j  e  I  for  which  xj  >  0.  Show  that  a  necessary  and  sufficient  con¬ 
dition  that  an  admissible  set  of  xt  provide  the  minimum  is 

min  [9?;  {x,  +  j  )—<pi  (x,)]  >  max  [9-,  (x,)  —  <p,  {x,  _  1)] , 

J'e/  /' <*+(■>•> 

and  obtain  the  corresponding  condition  when  the  xi  are  restricted  merely 

n 

to  be  non-negative  and  satisfy  £  xj  =  m.  (Gross) 

i  1 

69.  Consider  a  rectangular  matrix  A  =  (an).  It  is  desired  to  start  at  the 
(1,  1)  position  and  proceed  to  the  (m,  n)  position  moving  one  step  to  the 
right  or  one  step  down  each  move,  in  such  a  way  as  to  minimize  the  sum 
of  the  at)  encountered.  Show  how  to  determine  optimal  paths.  (Dreyfus) 

70.  Suppose  that  we  have  a  toaster  capable  of  toasting  two  slices  of  bread 
simultaneously,  each  on  one  side.  What  toasting  procedure  minimizes  the 
time  required  to  toast  three  slices  of  bread,  each  on  two  sides  ? 

(J.  E.  Littlewood! 

Solve  the  generalized  problem  requiring  the  processing  of  X  A-sided  items 
by  means  of  M  machines  which  can  each  process  R  items  on  s  sides 
simultaneously. 
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71.  Consider  a  3-terminal  communication  system. 


T* 


with  message  loads  at  each  of  the  terminals  for  the  other  terminals. 
Let  rt)  denote  the  maximum  number  of  messages  that  can  be  sent 
from  T,  to  T,  in  unit  time,  and  consider  the  two  cases,  first,  where  there 
is  iio  interference  between  signals  going  from  Tt  to  Tf  and  those  going 
in  the  reverse  direction  from  T)  to  T t,  and,  second,  where  the  total 
number  messages  in  both  directions  cannot  exceed  rtf. 

Let  xt),  i,  j  —  1,  2,  3,  i  ±  j.  denote  the  quantity  of  messages  at  Tt 
with  ultimate  destination  Tj,  and  assume  that  a  unit  time  is  consumed 
transmitting  a  message  from  any  Tt  to  any  Tf.  Denoting  by  /„  (xtf)  the 
maximum  quantity  of  messages  that  can  be  delivered  in  n  time  units, 
derive  a  recurrence  relation  for  the  sequence  {J„  ( Xtj )  }. 

(Juncosa-Kalaba) 

72.  A  newspaper  delivers  papers  to  a  number  of  newsstands.  Assuming 
that  the  distribution  of  sales  at  each  of  these  stands  is  known,  and 
assuming  that  a  certain  quantity  of  unsold  papers  mayr  be  returned, 
suitably  discounted,  how  many  papers  should  be  published,  and  how 
should  they  be  distributed  ? 

73.  Consider  the  problem  of  minimizing  a  sum 

I'N  (*1,  *2 . X.v)  =;  gl  (*l)  +  £2  (*2)  +  ...  4-  g.V  (**•), 

where  each  ft  is  a  convex  function,  and  the  variables  arc  subject  to  the 
constraints  a  <;  xi  <  *2  <,  ...  <;  x\  <  b.  Define 

f\  (a,  b)  =  Min  F.\  (xi,  . . . ,v.v),  for  .V  -  1,2,  . . . , 

{'.} 

and  —  00  <  a  <  b  <  00.  Show  that 

ft  *  1  («.  *)  Min  g;  *  1  (v)  +  ft  (a,  x)  ]. 

«  v  <  * 
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74.  Let  g  (x)  be  continuous  and  convex  for  x  ;>  d.  Define 

g(r,s)=  Min  g  (x),  d  <.  r  <;  s. 

f  <x£« 

Show  that  for  d  <>  a  b  <,  c,  we  may  write 

g  («.  c)  =  g  (a,  b)  -f  g  (b,  c)  —  g  (6,  o). 

In  addition,  show  that  g  (a,  x),  as  a  function  of  x,  is  continuous  and 
convex  for  x  ;>  a.  (Karush) 

75.  Prove  that  under  the  above  hypotheses  that 

fs  (a,  c)  =  fs  (a,  b)  -f  fs  (b,  c)  — fs  (b,  b),  — oo  <  a  <Z,b  <,  c  <oo. 

(Karush) 

76.  Let  the  gt  (y)  be  convex  functions  for  —  oo  <  y  <  oo  which  are 
bounded  from  below.  Then  fs  (a,  b)  may  be  written  in  the  form 

fs  (a,  b)  —  us  (a)  +  vs  ( b ),  a<,b, 

where  us  (x)  and  vs  (x)  are,  respectively,  increasing  and  decreasing 
convex  functions  for  — oo  <  x  <  oo.  (Karush) 


77.  Let 

M 

fs  («0,  «1,  «2 . as)  —  Min  Max  |  xkcL  -  1 1  . 

<7  o  <  L  <  S  1-0 

Show  that 


fs  (<*o,  at,  a2 . as)  =  Min  Max  [|  a0  —  x0  c0\,fs  -  i  («i  —  *i  c0, 

«2  - Xz  Cq,  ...)]. 


78.  Derive  a  similar  expression  for 

M 

fs  (a0,  tfi,  «2,  . . . ,  as)  ~  Min  E  ( a l —  E  x*■CI._*),, 

rt  i  -  o  1-0 

and  obtain  thereby  recurrence  relations  for  the  coefficients  in 


(AT) 

fs  (ao,  a\,  .  . . ,  as)  —  2-  qTn  ar  a 

•  -  o 


79.  A  sleuth  investigating  a  murder  has  N  witnesses,  one  of  whom  is 
the  murderer,  of  different  degrees  of  reliability.  Let  pi  be  the  probability 
that  the  tlh  witness  tells  the  truth  at  any  particular  time  to  any  particular 
question.  The  detective  interviews  the  witnesses  in  some  order,  asks 
the  first  witness  a  question,  and  then  each  succeeding  witness  a  question, 
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which  may  l»c  a  direct  question  or  a  question  concerning  the  truth  of  the 
testimony  of  preceding  witnesses.  Supposing  that  he  is  allowed  one 
question  at  a  time,  and  that  the  time  required  for  the  »M  witness  to 
answer  a  question  is  h,  in  what  order  should  the  witnesses  be  interrogated, 
and  what  questions  should  be  asked  to  maxim.ze  the  probability  of 
determining  the  murderer  in  a  fixed  time  /'? 

80.  Consider  the  problem  of  minimizing  the  function 

7'N  (*1,  x2 . Xs)  =  qpi  (.Vi)  4-  (f2  (X2)  4  ...  +  <PN  (*n) 

over  all  "values  of  the  xi  subject  to 

(a)  xt  >  0 

(b)  x\  ^  n, 

x i  -f-  x2  >  r2, 

-Vi  4r2+  ...  4-  .v.v  >  r.v  . 

Define  the  sequence 

A  (z)  =  Min  1'  ff  t  [x t)  , 

x  i  —  k 

over  the  region  determined  by 

(a)  xt  >  0 

(b)  xt  >  rk  —  z, 

xt  4  xt  _  i  >  rK  »  i  —  s, 

-va-  4  v*  +  i  4  ...  4  .v.v  >  r.v  —  s, 

for  z  >  0,  k  1,2 . V.  Show  that 

ft  (z)  Min  <ft  (xt)  -1-  /a  •  i  (2  +  ''*)  ]  . 

sL.  *  0 
xk  -  rk  ~  z 

for  k  1 ,  2,  .  .  . ,  *Y  —  1 ,  and  hence  that  Mm  l'\  (.\*i,  .v*> . .v.v)  —  f\  (0). 

81.  Show  tliac  the  above  problem’  with  the  additional  restriction  that 
Xi  4i  v,  <  d(  4-1  may  be  reduced  to  the  problem  of  determining  the 
sequence  {ft  {2,  c)}  as  defined  by 

ft  (z,  c)  Min  7  t  (a  t)  4  A  +  1  (~  +  xt,  a*)  ]  . 

it 

(Management  Science,  Vol.  3(11)56)  p  111  113). 
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82.  Consider  similarly  the  restriction  xi  i  <;  ?-\i. 

8.3.  Determine  the  structure  of  the  optimal  policy  in  the  case  where 
the  <fi  (x)  are  linear  functions  of  .v,  (/ 1  (v)  =■  rk  \,  and  we  assume 


a.  rk  .  i  >  rk 
h.  r*  .  i  <  n 

c.  the  ri  steadily  increase,  then  decrease. 

d.  the  ri  steadily  decrease,  then  increase. 

( An  tosiewicz-  Hoffman) 


84.  (iiven  a  continuous  convex  function,  /  ( v ) ,  and  two  values,  one 
|M)sitive,  f  (xt)  ">  0,  and  one  negative,  /{.»  •)  <0,  .Vi  <  \_>,  we  wish  to 
determine  the  position  of  the  zero  of  the  function  in  .Vi,  ,v»  .  The  tiroblem 
is  to  minimize  the  maximum  length  of  interval  in  which  we  can  guarantee 
that  the  zero  lies  after  it  evaluations  of  f(x),  where  the  evaluations  are 
performed  sequentially. 

Define  R„  (s,  v)  to  he  the  minimum  length  of  interval  on  which  we 
can  guarantee  locating  the  zero  in  •>,  1  of  any  convex  function  /,  given 
that  /{0)  I,  /(l)  =  v,  that  we  know  that  the  root  is  between  S 
and  1,  and  that  we  have  it  evaluations  to  perform.  Show  that 


A\,  (  v,  v] 


I  v 


Kn  (S,  >') 


Min  Max 

i 

i  >/ 


Max  ,(V|V 

.nr  - -l  \  VV  V  / 


Max  (I  \ )  A\,  i 

•'  *  1  r  1  1  u) 


'  .  '' 

\  I  v  vj_ 

(( iross-  Johnson) 


85.  A  man  is  standing  on  a  queue  waiting  for  service,  with  A  people 
ahead  of  him.  He  knows  the  utility  of  waiting  out  the  queue,  r,  and  the 
probability  />  that  a  person  will  lu  served  in  unit  time.  On  the  other 
hard,  he  incurs  a  cost  of  c  for  every  unit  of  time  spent  waiting.  The 
problem  is  to  dett  rmine  his  waiting  policy  if  he  wishes  to  maximize 
his  expected  return. 

Let  fx  denote  the  expected  return  obtained  employing  an  optimal 
waiting  policy  when  there  are  .V  people  ahead.  Show  that 

./a  Max  r  t  f  /'  I  0  />)/•'.  <> 
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Ar  =  1,2 . with  fo  =  r.  Hence  show  that 

f\  =  Max  \-t*  -  1 - - - ,  0  ] 

L(l  —px)  (1  —P)  J 

and  thus  determine  the  optimal  policy. 


(Haight) 


86.  Consider  the  same  problem  under  the  assumption  that  he  can  wait 
at  most  a  time  T.  (Haight) 


87.  What  policy  does  he  pursue  if  he  knows  that  a  probability  p  exists, 
but  does  not  know  its  precise  value  ?  (Haight) 


88.  Consider  a  forestry  firm  in  which  we  start  with  a  fixed  capital  and 
a  certain  presence  of  timber.  We  assume  that 


1.  There  is  a  fixed  initial  amount  of  cash  available,  and  no  revenue 
other  than  proceeds  from  selling  timber,  and  from  interest  on  cash 
on  hand.  No  borrowing  is  allowed,  and  all  current  expenses  must 
be  covered  by  cash  and  sales. 

2.  Trees  can  be  grown  only  from  seed;  it  is  impossible  to  buy  young 
trees  from  outside  the  “economy.” 

3.  The  annual  increment  of  “timber”  depends  on  the  age  of  the  tree 
(growth  rates  need  not  be  monotonic). 

4.  The  cost  of  “carrying”  a  growing  tree  for  one  year  depends  on  the 
age  of  the  tree. 

5.  The  selling  value  of  a  tree  depends  only  on  its  timber  content, 

i.e.  its  age. 

6.  The  aim  of  the  process  is  to  maximize  the  money  available  after 
a  fixed  number  of  years. 


Four  activities  may  be  engaged  in,  lending,  planting,  carrying,  and 
felling. 

1.  Money  can  be  lent  for  a  year  at  interest  rate  r. 

2.  Money  can  be  used  to  plant  trees. 

3.  Money  and  trees  can  be  used  to  provide  older  trees. 

4.  Trees  of  a  given  age  can  be  cut  down  to  provide  money. 

Over  a  given  time  period  how  does  one  proceed  so  as  to  maximize 
the  total  assets,  capital  plus  timber? 

(Morton,  Dynamic  Programming,  Proceeding  of  an  International  Conference 
on  Input-Output  Analysis,  J  W'ilev  and  Sons,  1956). 

89.  Consider  a  multi  component  electronic  system  whose  reliability  may 
be  taken  to  be  the  product  of  tin  reliabilities  of  the  individual  components. 
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To  improve  the  reliability  of  a  particular  stage,  we  can  put  a  number 
of  units  in  parallel.  Let  pk  ( xk )  be  the  reliability  of  the  klh  stage  when 
xt  units  are  put  in  parallel  at  the  k,h  stage,  and  let  g*  (x*)  he  the  cost 
of  inserting  x*  units  in  parallel. 

The  problem  is  to  maximize  the  total  reliability 

x 

Px  ( x )  =  71  fik  (x*), 

A  -  1 

subject  to  the  restrictions 

a.  xt  =  1,  2,  3,  . . ., 
x 

b.  E  g*  (x*)  <;  c. 

A  -  1 

If  fs  (c)  =  Max  p\  ( x ),  show  that 

fx  (c)  =  Max  [fix  (x)fx  -  i  (c  —  g.v  (x)  )  ], 
where  the  maximum  is  over 

a.  x  —  1 ,  2 . 

b.  g.v  (x)  <1  c.  (Nadel) 

90.  Assume  that  there  are  two  “costs,”  one  in  terms  of  actual  mono}’, 
and  the  other  in  terms  of  weight. 


91.  Discuss  the  connections  between  the  following  problems: 

x  x 

a.  Maximize  17  fik(xk),  subject  to  E  g»-  (a>)  <,c i, 
a  -  1  a  -  1 


E  hk  (x*)  c>,  and  xk  —  1,2 . 

A  -  1 

X  X  X 

b.  Maximize  TJ  fik(xk)  — ?.\  E  g»-  (xk)  — E  hk{xk), 
a  -  i  a  -  1  *  a  -  1 


subject  to  x*  —  1,  ‘2, 
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X  X 

c.  Maximize  //  pt  (a>)  — 2.'  tfk(xk),  subject  to 

i-i  i- •  -  1 

.v 

~  hk  (xt)  <,  c-2,  and  .v4-  =  1,2, . 

i  -  i 

■V  .V 

d.  Minimize  1'  £*  (*»)  -f  /.a  2T  fn  ( a*),  subject  to 

k  -■  1  k  -  1 

.V 

II  pk  (xk)  r,  Xk  —  1,2,  ..... 

i-  l 

92.  Obtain  the  "corresjionding  functional  equations,  and  discuss  the 
question  of  most  convenient  computation. 

93.  The  requirement  for  a  machine  of  a  certain  type  as  a  function  of 
time  is  known.  It  is  desired  to  institute  a  procurement  policy  to  meet 
this  demand  at  minimum  cost,  given  the  following  information. 

1.  Procurement  of  new  machines  cost  p  dollars  per  machine. 

2.  Maintenance  of  a  machine  costs  m  dollars  per  time  period. 

3.  Cost  of  upkeep  and  repair  per  period  is  a  known  function  of  the 
number  of  machines  on  hand  and  the  number  required. 

Show  that  the  corresponding  functional  equation  is 


/.v  (v)  Min  Pzi  +  d/  (:,  ,v)  +  L\  (;■  +  .v)  +  fs  _  i  (.v  +  *i)  ], 

where  0i  can  assume  only  the  values  0,  1,2,  ...  . 

Obtain  the  solution  under  the  assumption  that  each  function  L A-  (x) 
has  the  form 


and,  as  a  special  case,  is  parabolic,  a  quadratic  in  .v. 

91.  Consider  the  problem  for  the  case  where  two  distinct  types  of 
machines  are  being  procured,  with  joint  maintenance  facilities,  but 
independent  demand. 
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Bibliography  and  Comments  for  Chapter  III 

§  1.  The  basic  ideas  of  this  chapter,  together  with  the  "principle  of 
optimality”,  were  first  stated  in  the  monograph  "An  Introduction  to  the 
Theory  of  Dynamic  Programming,”  HAND  Corporation.  1953,  an  out¬ 
growth  of  a  shorter  paper  written  in  1952,  but  not  published  then.  This 
paper,  in  turn,  was  the  result  of  research  done  in  1949,  1950  and  1951,  and 
contained  in  a  number  of  unpublished  papers. 

§  3.  As  we  have  recently  shown  in  connection  with  some  joint  work 
(R.  Bellman  and  Ii.  Kalaba,  “On  the  Principle  of  Invariant  Imbedding  and 
I*ropagation  Through  Inhomogeneous  Media,”  Proc.  Sat.  Acad.  Sci.,  (1956), 
the  "principle  of  optimality”  is  actually  a  particular  application  of  what  we 
have  called  the  "principle  of  invariant  imbedding.”  A  special  form  of  the 
invariance  principle  was  used  by  Ambarzumian  "On  the  Scattering  of 
Light  by  a  Diffuse  Medium.”  C.  It.  Doklady,  ci .  i’./t  S.S.  38  (1943),  p.  257 
and  extensively  developed  by  S.  Chandrasekhar  Palliative  Transfer,  Ox¬ 
ford,  1950.  An  early  use  of  the  method  is  due  to  G.  Stokes  ( Mathematical 
and  Physical  Papers,  Vol.  IV,  "On  the  intensity  of  the  light  reflected  from 
or  transmitted  through  a  pile  of  plates,"  pp.  145-156). 

The  functional  equation  technique  used  throughout  is  intimately  related 
to  the  "Point  of  Regeneration"  method  used  in  the  study  of  branching 
processes,  cf.  R  Bellman  and  T.  K.  Harris,  "On  Age-Dependent  Binary 
Branching  Processes,”  Ann.  Math.,  Vol.  55  (1952),  pp  280  295. 

Actually,  we  have  made  no  systematic  effort  to  trace  the  origin  and  use 
of  invariance  principles,  and  the  above  references  represent  only  a  few  of 
the  many  that  could  be  cited.  One,  however,  which  cannot  be  ignored  is 
J.  Hadamard,  “l.e  prmcqie  de  Huygens,"  Hall.  Soc.  Math.  Prance,  52  (1924), 
pp.  610-640,  where  there  is  an  interesting  discussion  of  causality,  functional 
equations  and  Huygens’  principle. 

The  classic  reference  to  semi-group  theory  is  K.  Hille,  “Functional  Ana¬ 
lysis  and  Semi-groups,”  Ainer.  Math.  Soc.,  1948. 

i)  6.  A  detailed  discussion  of  the  formulation  of  variational  problems  as 
continuous  decison  processes  will  lie  found  in  Chapter  9. 

sj  9  A  discussion  of  causality  and  optimality,  together  with  the  interrelation 
with  semi-groups  may  be  found  in  R.  Bellman,  "Dynamic  Programming  and  A 
New  Formalism  in  the  Theorv  of  Integral  l  iquations,”  I'roc.  Sat.  Acad.  Sci., 
Vol.  41  (1955),  pp.  31  34 

Problem  92.  See  R  Bellman,  "Dynamic  Programming  and  Lagrange  Mul¬ 
tipliers”,  Proc.  Sat.  Acad.  Sci  ,  Vol.  42  (1956),  pp.  767-769. 
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Existence  and  Uniqueness  Theorems 

§  1.  Introduction 

In  the  previous  chapter  we  outlined  the  skeletal  structure  of  dynamic 
programming  processes  and  derived  various  general  classes  of  functional 
equations.  In  this  chapter,  we  shall  abstract  the  particular  methods 
utilized  in  Chapter  I  and  II  to  treat  the  equations  occurring  therein  and 
derive  some  existence  and  uniqueness  theorems  for  the  more  general 
equations  of  Chapter  III.  Our  principal  tool  will  be  the  method  of  successive 
approximations  due  to  Picard. 

Although  all  the  proofs  follow  essentially  a  common  track,  each  requires 
its  own  detour  at  an  appropriate  point.  Consequently,  in  place  of  at¬ 
tempting  to  frame  the  hypotheses  in  such  general  terms  that  we  can  state 
all  our  results  in  a  single  theorem,  at  the  possible  expense  of  clarity  and 
loss  of  understanding  of  the  simple  mechanism  involved,  we  have  divided 
our  results  into  a  number  of  theorems  referring  to  particular  classes  of 
equations.  The  basic  method  of  proof  is,  however,  the  same  throughout. 

Our  first  step  consists  of  formalizing  the  device  we  have  used  before  to 
compare  the  solutions  of  two  equations,  cf.  §  7  of  Chapter  I  and  §  6  of 
Chapter  II.  The  resulting  inequality  is  essential  to  our  proofs  in  this  chap¬ 
ter,  and  will  be  utilized  again  in  oi  r  treatment  of  multi-stage  games  in  a 
later  chapter,  and  in  comparison  theorems  in  the  calculus  of  variations  in 
Chapter  IX. 

The  first  class  of  equations  we  treat  are  those  where  ea'di  operation 
results  in  a  shrinking  of  resources,  which  is  to  say,  the  point  transforma¬ 
tions  involved  are  shrinking  transformations  in  the  sense  of  Cacciopoli. 
Equations  of  this  type  we  rather  unimaginatively  call  equation  of  type  one. 

The  next  class  of  equations  which  we  discuss  are  those  where  the  prob¬ 
ability  of  survival  decreases  uniformly  with  each  operation.  This  is 
equivalent  to  the  functional  transformation  being  a  shrinking  transfor¬ 
mation.  These  equations  we  name  equations  of  type  two. 

Roth  types  have,  in  particular  cases,  the  form 

(i)  f(p)  =  Is  ( P .  7)  +  h  {p,  q )/(T  (p,  q))] 

v 

where  the  quantities  occurring  arc  as  defined  in  the  previous  chapter. 
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As  wc  shall  see  these  equations  are  rather  readily  treated  by  standard 
iterative  techniques,  with  the  aid  of  our  basic  inequality.  Equations  which 
do  not  belong  to  cither  of  these  classes  usually  require  some  fancier 
techniques,  as  wc  shall  see  in  our  treatment  of  a  particular  equation  m 
§  8.  All  equations  not  of  types  one  or  two,  we  blithely  lump  togeth  r  vs 
those  of  type  three. 

Following  these  results  on  existence  and  uniqueness,  we  shall  disci.  $ 
monotone  convergence  in  a  general  setting,  and  state  some  general  st;  - 
bility  theorems  established  in  the  same  fashion  as  before. 

After  indicating  some  directions  of  generalization,  v/hich  can  be  carried 
quite  far,  we  shall  consider  a  particular  equation  of  type  three,  as  men¬ 
tioned  above.  Here  we  have  a  combination  of  two  types  of  shrinking 
transformations,  and  the  treatment  is  a  bit  more  involved. 

Wc  shall  close  the  chapter  with  a  discussion  of  an  interesting  integral 
equation  arising  in  the  theory  of  "optima!  inventory”  or  “stock  control,” 
a  subject  which  we  shall  treat  in  greater  detail  in  the  following  chapter, 
where  particular  solutions  are  obtained. 

Apart  from  their  interest  in  connection  with  multi-stage  decision  pro¬ 
cesses,  the  equations  we  consider  possess  the  analytic  merit  of  constitu¬ 
ting  in  many  ways  a  natural  extension  of  linear  equations.  As  such,  their 
stud}'  is  valuable  since  they  serve  as  a  bridge  between  the  well-regulated 
preserve  of  linear  equations  and  the  as  yet  untamed  jungle  of  nonlinear 
equations. 

§  *2.  A  Fundamental  inequality 

Let  us  consider  the  two  functional  transformations 

(1)  S,(f,  p,q)  =g  (p.q)  4-  |  f  (r)  dG  (/>,  q,  r) , 

J  r  1 1) 

St  (f,  p,  q]  =  h  ( P ,  q)  +  f  /  (r)  dG  [p,  q,  r) , 

Jr  r  O 

where  dG  (p,  q,  r)  0,  and  define  two  additional  transformations  as 
follows: 

(2)  a.  /,(/>)=  Sup  S,  (/„/>,  q) 
b.  Ft  (/>)  =  Sup  St  (F„  p,  q) . 

i 

There  is  no  need  to  go  into  a  discussion  of  what  we  mean  by  the  Sticltjes 
integral  here  since  wc  arc  using  it  in  a  purely  formal  manner.  All  our 

results  will  actually  be  utilized  for  the  case  where  f(r)dG  (p,  q,  r)  = 

J  r  e  D 
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h  (p,  q)  f{T(p,  q)),  and  t lie  reader  unfamiliar  with  the  Stieltjes  integral 
need  merely  make  this  transformation  to  reduce  all  the  equations  to 
familiar  terms,  or  he  may  consider  dG  (p,  q,  r)  to  have  the  form 
H  (p,  q,  r)  dr,  with  //  ;>  0. 

The  inequality  we  wish  to  prove  is 


Lemma  1. 


(3)  I  ft  ( P )  —  /'*  iP)  I  ^  Sup  [  |  g(p,  q)  —  h  ( p ,  q)  | 

7 

+  f  fi(r)—  I'l  (r)  |  dG  (p,  q,  r)]. 

J  r,l> 


Proof.  Let  us  simplify  the  notation  initially  by  assuming  that  both 
transformations  in  (2)  have  t ho  property  that  the  supremum  is  actually  a 
maximum.  Let  then  q  —  q  (p)  be  a  value  of  q  for  which  the  maximum  is 
assumed  in  (2a),  and  q  =  q  (/>)  be  a  value  of  q  for  which  the  maximum  is 
assumed  in  (2b).  Then  we  have  the  following  set  of  equalities  and  ine¬ 
qualities: 

(•1)  a-  fi  (P)  s',  (/, ,  />,  q)  ^  .S',  (/,,  p,  q) 

b.  1:  >  {p)  St  (/'  i,  p,  q)  >  So  (Fu  p,  q)  , 


as  in  §  7  of  Chapter  1  and  §  fi  of  Chapter  II. 

From  these  follow  immediately 

(•r>)  ft(p)  /•*(/>)  >kip.q)—  h(p,q)}  +  f  (, fx(r)—rx{r))dG(p,q,r ), 

Jr  f  I) 

and 

ft(P)  —  If  (/>)<  {ti(p.q)—h(p.q)}  +  f  (Mr)-Fl(r))dG(p,q,r). 

J  r  f  l) 

These,  in  turn,  yield  the  single  inequality 
(«’>)  ft  ip)  —  Ft  (p) 


<;  Max 


s  (p.  •/)  ( P.  q) 

g  (/>,</)  h{p,q) 


I  A(r)—Fl(r)  dG(p,q,r), 

fret) 

I  /,  (r)  — (r)  dG(pq,r), 

J  r  r  l) 


from  which  the  result  in  (3)  is  immediate.1 

To  obtain  the  result  as  stated  in  terms  of  the  supremum  it  is  only 
necessary  to  note  that  the  supremum  may  be  obtained  arbitrarily  closely 
hy  the  value  of  the  function  for  some  q  q  (/>).  The  argument  then  pro¬ 
ceeds  via  a  limiting  procedure. 

1  Wo  arc  using  tho  simple  result  that  a  x  <  b  implies  \  <  Max  (  a  ,  b  ) 


i:\ISTKNC  i:  AM)  I  Niyi'KNKSS  THHkKKMS 

$  :t.  Equations  of  Type  One 

Let  11s  now  im|x>sc  the  following  conditions  upon  the  functionsentering 
into  the  equation  of  (1.1): 


(1) 


cl* 


I ;  ( p ,  q)  is  uniformly  bounded  for  all  q  e  S  and  all  pel ) 
satisfying  the  restriction  p  |  <  c,.  n here  ]  \  p  \  — 


l)  is  the  domain  of f,  it  contains  the  nullvector 


p  =  0,  and  T  (/>,  q)  e  /)  for  all  p  e  l). 


1).  g  (0,  q)  =  0  for  all  q  e  .S'. 


c.  h  (p,  q)  t  <,  I  for  all  pel)  and  q  e  S . 

d.  V  (/>,  q)  <,  a  p  ,  for  some  a  <  1 ,  for  all  q  e  S  and 
all  p  e  I). 

oo 

e.  If  v  (c)  —  Sup  Sup  '  g  (p,  q)  ,  then  —  v  (a"c)  <  oo. 

!  I  /<  I  I  ■  <■  i  "  " 


E<|uations  which  satisf\  these  assumptions  are  called  equations  of  Type 

One.  In  many  cases,  it  may  he  more  conyenient,  and  natural,  to  use  the 
,\ 

norm  />  —  />,  .  It  seill  he  clear  from  the  argumentation  below 

that  the  |>rtcise  form  of  the  norm  is  of  little  importance. 

Our  principal  result  concerning  these  equations  is  the  following: 


Thkokkm  I.  Consider  the  equation 

(’7)  f(pi  Suf)  g  (/>.  q)  -4  h  (p,  q)  f  (T  (/>,  q)Y\,  p  0 

f{<>)  «». 

assumed  to  he  of  Type  One. 

There  is  exactly  one  solution  of  (’2)  which  is  continuous  at  p  0  and  equal 
to  ’ro  there,  and  defined  over  all  of  I). 

/  his  solution  may  he  obtained  as  the  limit  of  the  sequence  {/„  (p)}  defined 
as  follows: 

(3)  a.  /„  (/>)  Sup  g  (p,  q) 

h.  f„  ~  i  (p)  Sup  g(p.q)  h  (p.  q)  f„  (7  (/>,  q))],  it  0,1,2,... 
u 

Alternatively,  any  initial  function  f„(P)  which  is  continuous  at  p  0  and 
equal  to  zero  there,  and  hounded  for  p  <  cxfor  any  c ,  >  O,  p  e  I),  may  he 
used  in  (3b)  to  yield  a  convergent  sequence. 

If  g  (p,  q),  h(p,q),  and  T  (f>,  q)  are  continuous  in  p  in  any  bounded 
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portion  of  D,  uniformly  for  all  q  e  S,  then  f{p)  is  continuous  in  any  bounded 
portion  of  D. 

Proof.  Let  us  consider  the  sequence  defined  by  (3).  Using  Lemma  1, 
proved  in  §  2,  we  have  for  n  2>  1 , 

(4)  \fn*x  (P) -fn  ( p )  |  <;  Sup  I  h  (P,  q)  I  \fn  ( T  (/>,  q))  — /„  _  ,  (T(P,  q))\ 

^Sup  \fn  (T  (p,  q))  — i  (T  (P,q))\, 

9 

and 

(5)  l/i  ( P )  — fo  ip)  |  ^  Sup  \f0  ( T  (p,  ?))|  =  Sup  |  g  (p,  q)  |. 

i  t 

Let  us  now  define  the  new  sequence 

(6)  Vn{c)  =  Sup  |  y»+l  (p)—fn(p)  |,  \\P\\^C,PeD. 

r 

Using  the  function  defined  in  (1  e),  we  see  that  v0  (c)  =  v  ( c ).  Turning 
to  (4)  we  have,  for  p  e  D,  \\p  ||  <  c. 

(7)  Sup  |  fn  +  i  (/>)  —  fn  (p)  I  ^  Sup  Sup  \  f„  (T  (P,  q))  —f„  _  I  (T  (p,  ^))| 

9  P  9 

<.  Sup  \f„  (p)  —f„  -  i  (p)  |  , 

I  I  P  I  I  <  w 

by  virtue  of  our  assumption  concerning  T  (p,  q).  Hence  v„  +i  (c)  <  vn  ( ac ). 

oo 

n  =  0. 1,2,  ....  or  vn(c)  <,  v0{an  c).  It  follows  that  the  series  2’  [fn+i(p) 

n  -  o 

— /n  (/>)]  converges  uniformly  for  p\\<Lc,  and  hence  that  {fn  (/>)} 
converges  uniformly  to  a  function  f  (p)  for  \\p  \\<,c. 

This  completes  the  proof  of  existence  and  the  proof  of  the  statements 
concerning  convergence  and  continuity. 

To  establish  uniqueness,  let  f  (p)  and  F  [p)  be  two  solutions  of  (1) 
continuous  at  p  —  0,  and  hence  defined  for  all  p  e  D.  Let 

(8)  v  (c)  —  Sup  \f(p)  —  F  (p)  |  ,  p\\<,c,peD. 

V 

Applying  Lemma  1,  we  have 

W  I  f{p)—F{P)  l<  SuP  \f(T(p,q))~F{T  (p.q))  |, 

i 

whence 

(10)  v  (c)  <  v  (ac)  v  (an  c) . 

Since  /  (p)  and  F  (p)  are  continuous  for  p  =  0,  v  ( a"  c)  ->  0  as  n  ->  oo. 
Hence  v  ( c )  =  0,  and  /  (p)  —  F  (p). 
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The  utility  of  Lemma  1  lies  in  the  fact  that  it  enables  us  to  bypass  any 
discussion  of  the  behavior  of  the  maximizing  q  as  a  function  of  p,  a  subject 
of  great  difficulty  about  which  little  is  known,  in  general. 

§  4..  Equations  of  Type  Two 

Let  us  now  consider  the  equation  of  (1.1)  where  we  impose  the  condi- 
,  tions 

(1)  a.  |  g  (p,  q)  |  is  uniformly  bounded  for  all  q  e  S,  and 

II  P  II  <.cupeD. 

b.  |  h  (p,  q)  |  a  <  1  for  all  q  e  S  and  uniformly  in  any 
region  WpW^c^peD. 

C.  ||  T  (p,  q)  ||  <;  ||  p  ||  for  all  p  or  alternatively  D  is  a 
bounded  region,  and  no  condition  is  imposed  upon  T  apart 
from  the  condition  that  T  (p,  q)  e  D  for  all  p  f  D. 

Equations  satisfying  these  conditions  we  shall  call  equations  of  Type  Two, 
We  shall  demonstrate. 

Theorem  2.  If 

(2)  HP)  —  Sup  [>  (p,  q)  +  h  ( P ,  q)  f  (T  (P,  q))} 

s 

is  an  equation  of  Type  Two,  there  is  a  unique  solution  which  is  bounded  in 
any  finite  part  of  I). 

The  solution  may  be  found  by  means  of  successive  approximations  as 
before,  and  the  previous  statements  concerning  continuity  of  the  solution 
remain  valid. 

Proof.  Let 

(3)  fa  ( p )  =  Sup  g  (p.  q) 

fr,  + 1  ip)  =  Sup  [g  (p,  q)  +  h  (P,  q)  f„  (T  (p,  ?))],  h  =  0,  1,  2,  . . . 

<7 

Using  Lemma  1 ,  we  have 

(4)  \fn  +1  (P)  —fn  [p]  I  ^  Sup  h  (p,  q)  if n  (T  (p,  q))  -  }n  -  1  (T  (p,  9))]  I 

Q 

<,  a  Sup  f„  (T  (P,  q))  —f„  -  i  {T  (p,  q))  \ , 

rt 

where  a  <  1.  From  this  point  on  the  proof  clearly  parallels  the  proof  of 
Theorem  1.  The  vanishing  at  p  =  0  is  now  a  consequence  of  the  equation 
itself. 
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§  5.  Monotone  convergence 

We  have  in  the  preceding  sections  demonstrated  convergence  of  the 
successive  approximation  under  assumptions  which  yielded  essentially 
geometric  convergence.  Let  us  now  show  that,  under  the  assumption 
that  h  ( p ,  q )  ^  0,  which  is  true  in  all  the  applications  to  date,  we  have  at 
our  disposal  a  method  of  choosing  an  initial  approximation  which  will 
yield  monotone  convergence  in  addition. 

In  some  equations  of  Type  Three,  where  convergence  of  geometric  type 
is  either  difficult  to  establish,  or  else  non-existent,  this  is  a  valuable 
technique. 

Let  us  consider  our  equation  in  the  form 

(1)  f(P)  =  Max  [g  (/>,  q)  -f-  h  Ip,  q)  f  (T  (p,  ?))] . 

v 

Let  q„  =  qa  (p)  be  an  initial  approximation  to  q  (p)  and  let  /„  (p)  be 
determined  by  use  of  this  policy,  i.e., 

(2)  fo  {p)  g  {p,  q„)  +  h  (/>.  qu)  f0  (T  (p,  qa)) , 

and  the  secpience  {/„  (/>)},  n  —  1,2,  .  .  .,  then  be  determined  recursively, 

(3)  fn  +i  (/>)  =  Sup  [g  ip,  q)  -f  h  (/>,  q)fn  ('/'  (p,  q))},  n  =  0,  1,  2,  ... 

[Having  introduced  the  concept  of  approximation  in  policy  space,  it  is 
now  convenient  to  use  the  supremum  again  to  bypass  questions  of  no 
little  difficulty,  concerning  continuity  over  </.]  Let  us  assume,  as  in  the 
case  of  equations  of  Types  One  and  Two,  that  sufficient  conditions  have 
been  imposed  to  have  the  sequence  {/n  (/>)}  uniformly  bounded  in  anv 
finite  portion  of  I). 

It  is  immediately  seen  that  /,  (/>)  > /„  (/>),  and  therefore,  by  virtue  of 
the  non-negativity  of  h  (p,  q),  that  /„  i  (p)  >fn  (p)  for  all  n.  It  follows 
tl-.at  f„  (p)  con\-erges  to  a  function /  (/>)  as  it  — >  oo,  in  any  finite  part  of  I>. 

If  q  is  a  member  of  a  finite  set  .S',  there  is  no  question  of  the  conver¬ 
gence  of  {/„  (/>)}  to  an  actual  solution  of  (.1),  where  the  supremum  is  now 
a  maximum.  If  N  contains  a  continuum,  it  is  perhaps  not  immediate  that 
/(/>)  is  the  bounded  solution  of 

(1)  /(/>)  S»P  g  (P.  q)  +  h  {p,  q)f(T  (p,  i/))  . 

</ 

To  establish  this,  we  observe  that  by  virtue  of  the  monotone  convergence, 
we  have 

(">)  fn  i  (P)  <  Sup  g  (p,  q)  -f-  h  (/>,  q)f(T  (/>,  q))} , 

q 

whence 
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(1 2 * * * 6)  HP)  <.  Sup  Iff  (p.  q)  +  h  (p,  q)  f  (T  { p ,  q))] . 

9 

On  the  other  hand,  we  have 

(7)  KP)  ;>  Sup  [g  (P,  q)  +  h  (P,  q)fn  (T  (/>.  q))) 

9 

^K(p.q)  +  h(p,q)/„(T(p,q)) 

for  all  q  e  S  and  all  n.  Letting  n  — >  oo,  we  obtain  the  reverse  inequality  to 
(6),  and  hence  equality. 

This  property  of  monotone  convergence  or,  at  worst,  monotone  approx¬ 
imation,  is  particularly  useful  in  other  parts  of  the  theory  of  dynamic 
programming,  and  in  particular,  in  applications  to  the  calculus  of  varia¬ 
tions,  as  we  shall  sec  in  a  later  chapter. 

§  0.  Stability  theorems 

In  the  theory  of  functional  equations  a  problem  of  great  theoretical 
interest,  with  important  physical  ramifications,  is  that  of  the  dependence 
of  the  solution  upon  the  form  of  the  equation.  In  particular,  a  great  deal 
of  effort  has  been  devoted  to  the  determination  of  those  equations  which 
have  the  property  that  small  changes  in  the  form  of  the  equation  effect 
correspondingly  small  changes  in  the  form  of  the  solution.  Equations 
which  do  not  have  this  property  are  in  the  main  of  little  physical  interest. 
Let  us  now  consider  the  two  equations, 

(1)  a.  /  (p)  =  Sup  [g  (p,  q)  +  h  (p,  q)f(T  (p,  q))] , 

9 

b.  F  (P)  -  Sup  [G  (p.  q)  +  h  (p,  q)  F  (T  (p.  ?))] , 

i 

and  assume,  to  begin  with,  that  they  arc  both  of  Type  One.  We  wish  to 
obtain  an  inequality  for  Sup  '  f  (p)  —  F  (p)  | ,  p  e  1),  p  \\<,c,  where  / 

p 

and  F  are  the  unique  solutions  vanishing  at  p  —  0,  and  continuous  there, 
of  their  respective  equations. 

To  obtain  this  inequality,  wc  employ  the  method  of  successive  approx¬ 
imations  in  both  equations,  setting 

(2)  /,  (/>)  =  Sup  g  (p.  q) 

fn  +i  (p)  =-  Sup  [g  (p,  q)  -f  h(p  q)fn  (T  (p,  9))] 

9 

l-i  (P)  =  Sup  G  (p,  q) 

9 

Fn  +  1  (/>)  =  Sup  [G  (p,  q)  +  h  (/>,  q)  F„  {T  (/>,  q))} 

9 
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Wc  have 

(3)  I  ( P )  —ft  (P)  I  ^  Sup  |  G  (p,  q)  g  ( p ,  q)  \ , 

f 

and 

(4)  I  Fn  +  i  (P)  — /,  +i  (P)  |  ^  Sup  [  I  G  (p,  q)  —  g  {p,  q) 

+  h  (P,  q)  ||  Fn  (T  (P,  q))  -fn  (T  (p,  q))\  ] . 

Let  us  define 

(5)  u  (c)  =  Sup  Sup  |  G(p,q)—g  (p,  q)  | . 

I  I  P  I  I  <  '  1 

Then  we  have 

Theorem  3.  With  the  above  notation,  for  equations  of  Type  One, 

(6)  Sup  \F  (p)  —f(p)  \<,  r  u  (an  c) , 

I  I  P  I  I  <  e  n  -  0 

Proof.  Set 

(7)  Wn  (c)  =  Sup  Sup  I  Fn  (/>)  —  fn  {p)  \  • 

I  IpII  <  <•  <t 

n  —  1 

It  can  be  shown  inductively  that  we  have  wn  (c)  £  u  ( ak  c),  n  1, 

k  -  o 

using  (4),  and  the  hypotheses  governing  an  equation  of  Type  One.  Letting 
n  ->  oo,  we  obtain  (6),  since  F„  (p)  — >  F  (p),  and /„  (p)  -*•  f  (p). 

Similarly, 

Theorem  4.  With  the  above  notation,  for  equations  of  Type  Two, 

(8)  Sup  |  F  (p)  —  {p)  |  ^  u  (c)/(  1  —  a) . 

1 1  r  1 1  <  ' 

The  proof  follows  the  same  lines  a-  above,  and  is  therefore  omitted. 
Similar  estimates  can  be  obtained  in  the  cases  where  h  (p,  q)  and 
T  (p,  q)  are  perturbed. 

§  7.  Some  directions  of  generalization 

A  first  generalization  of  (1.1)  is  the  equation 

(1)  f(P)  =  Sup[g(/>,  q)  +  z  hi{p,  q)f(Ti(p,q))], 

q  i  -  I 

which,  in  turn,  is  a  particular  case  of 

(2)  /(/»)  =  Sup  [g  {p,q)-r  f  f(r)dG{p,q,r)]. 

q  J  r  e  I) 
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The  methods  utilized  above  yield  analogues  of  the  preceding  theorems 
concerning  existence,  uniqueness  and  stability  for  the  above  equations, 
and  systems  of  the  form 

(3)  /<  ( p )  =  Sup  [gt  ( p ,  q)  +  Z  f  fj  (r)  dGi,  (p.  q,r)],i=  1.2 . N, 

which  is  equivalent  in  form  to  (2)  if  we  employ  vector-matrix  notation. 
An  example  of  (2)  is  the  equation  of  "optimal  inventory," 

(4)  /(*)  =  Inf|>(*,  y)  +  a[(l  —  G(y))/(0)  +  ["  f{y  -  s)  dG  (s)]) , 

If  >  X  JO 

which  we  shall  treat  in  detail  in  the  next  chapter. 

§  8.  An  equation  of  the  third  type 

The  technique  of  approximation  in  policy  space  which  yields  monotone 
convergence,  discussed  above  in  §  5,  is  very  useful  in  establishing  the  exist¬ 
ence  of  solutions  of  equations  of  Type  Three,  a  class,  let  us  recall,  defined 
quite  simply  as  the  complementary  class  of  equations  of  Type  One  or 
Type  Two. 

Establishing  the  uniqueness  of  the  solution  of  equations  of  Type  Three 
is,  in  general,  a  problem  of  a  greater  level  of  difficulty,  as  we  shall  see 
below,  and  in  a  later  chapter  on  multi-stage  games  where  we  discuss 
"games  of  survival.” 

Let  us  illustrate  these  remarks  by  considering  the  functional  equation, 

(1)  /(/>)  =  Min  [1  +  Z  pkf(xt).  Min  [l  + /IT,  p)]],p^Xo, 

*  -  o  i 

f(x„)  =  0, 

where  l  runs  over  the  set  of  integers  1,2,  ....  M.  Here  we  set 

(2)  p  =  (p0,  px . p„),  pt^0,  Z  p,=  l  ; 

I  O 

Tip  =  (P„l,  pil,  ....  pnl),  pll  0,  pal  ^  1,  2-  pll  =  1  , 

t  -=  0 

where  pu  =  pu  (p) ;  l  =  1 ,  2,  . .  . ,  M ; 
xk-  —  (0,  . . .,  1,  ....  0),  the  1  occurring  in  the  kih  place, 
k  —  0,  1, 

The  function  /  (p)  is  a  scalar  function  of  p. 
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This  e<|uation  is  a  greatly  extended  version  of  the  equation  appearing 
i”  Exercise  30  of  Chapter  f. 

This  equation  can  be  considered  to  arise  in  the  following  way.  A  system 
is  known  to  Ire  in  one  of  (A’  4-  1)  different  states,  which  we  denote  by 
0,  1,  2,  .  .  A’,  with  an  initial  probability  {/>*}  that  it  is  in  the  A,h  state. 

By  means  of  a  combination  of  the  following  operations,  each  of  which 
consumes  a  unit  time,  we  wish  to  transform  the  system  into  the  0-state, 
with  certainty  that  it  is  in  that  state,  in  a  minimum  expected  time: 

L :  We  observe  the  actual  state  of  the  system  and  proceed  with  that 
knowledge ; 

A:  We  perform  an  operation  At  that  converts  the  original  prob¬ 
ability  distribution  {/>*}  into  a  new  distribution  {/>*/}. 

Let  p  =  ( p0 ,  pit  . . .,  ps),  and  f(p)  denote  the  expected  time  required 
using  an  optimal  policy,  when  the  system  is  initially  in  state  p.  Then  f(p) 
satisfies  (1)  above. 

We  shall  prove 

Theorem  5.  If  for  each  transformation  T i,  and  for  all  p,  it  is  true  that 

H 

(3)  -  pki<,Ci,  0  <  c,  <  1  , 

i  =  i 

then  there  exists  a  unique  bounded  solution  to  (1)  above.  This  function  is 
positive  for  p  A  x„. 

Proof.  We  shall  employ  the  method  of  successive  approximations,  using 
as  our  first  approximation  an  approximation  in  policy  space.  Let  us  re- 

n 

present  by  L  the  choice  of  1  +  H  pxf(xt),  and  by  7',  the  choice  of  /  = 

r  i) 

1  in  (1).  We  consider  the  function  /•',  (/>)  determined  by  the  policy  symbol¬ 
ized  by  LT i  LT i  . .  .,  and  the  function  I-\  (p)  determined  by  the  policy 
l\  ITT i  L. .  .  It  is  clear  that 

(4)  /',(/>)  =  1  +/•,  (Ttp),  p^xu, 

TAP)  =  1  +  1’  PnTAu),  pAxo, 

k  (I 

T  i  (x„)  —  1'  t  (x o)  =  0. 

Hence,  for  l  1,2 . n, 

H 

(5)  I'\  {xi )  -  2  +  1'  p kl  I\  (xk),  l  1.2 . «  . 

k  i 

M 

Since,  by  assumptions  2.'  pk-i<.  c,  <  1,  the  determinant  of  the  system 
k  i 
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does  not  vanish  and  the  system  has  a  unique  solution,  necessarily  positive, 
as  we  see  by  solving  iteratively.  Having  determined  /•',  (*/),  the  deter¬ 
mination  of  /•',  (/>)  and,  hence,  Fl  (p)  for  general  p,  is  immediate. 

To  begin  our  successive  approximations,  define 

(0)  /„(/>)  =  Min  [F,  (P).FAP)]. 

fn  +  i  (/>)  =  Min  [[1  1'  pk/„  (**)],  Min  [I  +  /„  (7'i  />)]],  p  ^  xa 

k  i  i 

fn  *  l  (Xo)  =  0  . 

It  is  readily  seen  that  /„  (/>)  ;>/,  (/>)  ;>  ...  f»(p)  ;>  1,  p  xa. 

Hence  /„  (p)  converges  monotonically  to  a  function  /  (p)  which  clearly 
satisfies  the  functional  equation.  This  establishes  the  existence  of  a  bound¬ 
ed  solution. 

The  uniqueness  proof  is  considerably  more  complicated  and  proceeds 
in  a  series  of  steps.  Let  /  (/>)  and  g  (/>)  be  two  bounded  solutions  of  (1).  The 
first  step  is 

Lemma  2.  Sup  \/(p)  —  g  (/>)  |  =  Max  | /(**•)  —  g  (**•)  |  . 

i>  k- 

Proof.  The  inequality 

(7)  Max  (a>)  —  g  (.va)  |  <  Sup  /(/>)—  g  {p)  \ 

k-  v 

is  clear.  To  demonstrate  the  reverse  inequality,  we  consider  four  cases: 

/(/>)  =  1  +  h  pk/(x :*) 

k  i 

A’  (/>)=!  +  -  />*■  K  (Xk) 
l  1 

/(/>)=  i  +  i'  pk  f(xk) 

k  i 

(P)  1  +  Z  ( I  ‘  P) 

f(P)  1  +f(T,P) 

g  (P)  I  4-  -  />*  g  (xk) 

k  l 

/(/>)=!  +J(T,p) 

K  (/>)  =  !+  g  (7V  />) 

Consider  first  the  case  corresponding  to  (a).  We  have 
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(9)  fip)  -  g  (P)  =  Z  P*  [/(**)  -  g  (x*)] . 

t  -  0 

whence 

(10)  I  fip)  ~ g(P)\<.  Max  |  /(**)  -  g  (xk)  |  . 

t 

Therefore  for  all  p  for  which  (8a)  holds,  the  lemma  is  correct.  Equation 

n 

(8a)  will  hold  whenever  p  is  close  to  x0.  since  1  +  L  pk  f(xk)  is  less  than  2 

t  -  1 

in  this  case,  and  1  -f  f(Ti  p)  ^2.  Thus  1  +  f  (Tip)  and  I  +  g  (Tip) 
will  exceed  the  result  of  the  Z.-move*  for  p  close  to  x0,  for  /  =  1,  2,  . . .,  M. 

This  is  an  important  point  since  the  crux  of  our  proof  is  the  fact  that 
(8a)  will  always  occur  after  a  finite  number  of  moves,  by  virtue  of  the 
condition  in  (3). 

Now  consider  case  (8b).  We  have 

(11)  /(/>)  =  1  +  Z  pkf(xk)<^l+f(TiP) 

k  -  0 

g  (P)  =  1  +  g  (T,  p)  ^  1  +  £  pkg  (xk)  . 

k  -  0 

Hence 

(12)  | f(p)  —g(P)\<.  Max  {Max  \f(xk)  —  g(xk)  |  ,  Sup  | /(Tip)  — 

k  P 

~  g  (Tip)  |}, 

and  similarly  for  (8c). 

From  (8d)  we  derive 

(13)  |  f(p)  -  g(p)  |  <;  Max  {\f(Ti  P)—g(Tip)  |,  \  f  (Tt.  p)-g  (T  i.  p)\). 

Wc  now  iterate  these  inequalities.  For  any  fixed  p,  TtlTit  ...  Tinp 
will  be  in  the  region  governed  by  (8a)  for  n  large  enough.  Consequently, 
we  obtain 

(14)  Sup  |/  ,>)—g(P)  |  ^  Max  \f(xk)  —  g(xk)  |. 

p  1 

This  completes  the  proof  of  the  lemma. 

It  remains  to  show  that  Max  \/(xk)  — g  (xk)  \  —  0.  Let  k  be  an  index 

i- 

at  which  the  maximum  is  assumed.  It  follows  from  the  functional  equation 
for  /  and  g  that 

(15)  /(**)  =  1  +f(Ti  **),  /  =  l(k) 
g(xk)  =  1  +g(Ti’xk),  /'==/'  (A). 

2  i.e.,  E-choice. 
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As  above  wc  have 

(16)  /(**)  -  1  +f(T,xk)<£  1  +/(T,‘x*) 
g(xk)  =  1  +  g{Ti  xk)<,  1  -f  g(T,xk). 

If  both  inequalities  are  proper,  we  have 

(17)  |  f(xk)  -  g  (xk)  |  <  Max  |  |/(7'l  xk)  -  g  ( T ,  xk)  |.  |  /  77  xk)  — 

—  g(Tt.  xk)  |  |  ^  Sup  \/{p)  —g(p)  !, 

p 

a  contradiction. 

Thus,  for  either  /  or  we  have 
<18)  /(**)  =  1  4-/(7',**),  or 

g  (**)  =  1  +  g(Ttxk). 

This  means  that  the  first  choices  from  the  position  xk  can  be  the  same. 

Consider  now  the  situation  for  second  moves.  Using  the  same  argument, 
we  see  that  the  second  moves,  i.e.,  the  equations  for / ( Tixk )  andg  (Ti  xk) 
can  also  be  the  same,  and  so  on,  inductively. 

Let  p„  —  pn  (xk)  be  the  distribution  achieved  after  n  moves,  where  the 
(n  +  l)st  move  puts  xk  into  the  region  governed  by  (8a),  The  argument 
above  shows  that  / and  g  land  in  this  region  on  the  same  move.  Thus, 

(19)  /(**)  =  (n  +  1)  +  1"  ptnf(xk) 

A  -  0 

g  (**)  =  (n  4-  1)  4-  ptn  g  (xk) , 

1-0 

and  consequently 

(20)  | /(**.)  —  g  (xk)  |  ^  X  pkn  |/M  —g{xk)  | 

k  -  1 

<.\l—pon\  Sup  [/(**•)  —  g(xk)  |. 

1 

Since  1  >  p0n  >  0,  this  implies  that  ,/(**•)  — g  (xk)  \  =  0.  Henr0  Sup 

p 

|  /  (p)  — g  (p)  |  =  0,  which  completes  our  uniqueness  proof. 

§  9.  An  “optimal  inventory”  equation 

In  this  section  wc  shall  discuss  the  equation 

(1)  /(*)  =  Inf  [My— *)  +  a  [  f  P  (s  —  y)  <F  (s)  ds  +/(0)  f  (p(s)ds 

y  >  X  Jv  Jv 

+  JV(y  —  s)<p(s)  rfs]], 
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for  x  ;>  0,  which  wc  have  already  mentioned  above  in  its  more  general 
form  involving  a  Stieltjcs  integral.  As  we  shall  see  in  the  following  chapter 
this  is  an  equation  which  occurs  in  the  study  of  "optimal  inventory”  or 
"stock  level"  control.  The  proof  of  existence  and  uniqueness  of  solution 
logically  appears  here  since  we  shall  employ  the  same  techniques  as  in 
the  previous  sections. 

Consistent  with  the  policy  we  have  followed  throughout,  we  shall  not 
consider  the  general  equation,  involving  Stieltjcs  integrals. 

To  simplify  the  subsequent  notation,  set 

(2)  7  (y,  x,f)  =  A  (y  —  x)+n[j^  p  (s  —  y)  (p  (s)  ds  +  f  (0)  J*  tp(s)ds 

+  JV(y  —  s)y  (s)  <fsj. 

The  equation  in  (1)  then  has  the  form 

(3)  /(a)  =  Inf  T  (>',  x,f) . 

ft  >  * 

Let  us  impose  the  following  conditions: 

(4)  a.  <f  (s)  ^0,  f  *(«)*-  1 

J  O 

/*  oo 

b.  />  (s)  is  monotone  increasing,  continuous,  and  p  (s)  qp(s)  ds  <oo 

Jo 

c.  A  ( y )  is  continuous  for  y  >.  0,  A*  (oo)  —  oo. 

d.  0  <  a  <  1 . 

Under  these  conditions,  we  have  the  result 

Theorem  6.  There  is  a  unique  solution  to  (1)  which  is  bounded  for  x  in  any 
finite  interval.  This  solution  /(a)  is  continuous. 

Let  f0  (a)  be  any  non-negative  continuous  function  defined  over  0  <  x. 
Define  the  sequence  {/„  (a)}  as  follows, 

(3)  fn  +i  (v)  =  Min  /'  (y,  \,fn),  n  =  0,  1,  2,  . . .  . 

u  * 

Then  f  (a)  lim  f„  (a)  exist  for  x  >  0  and  is  the  solution  of 

H  +  OO 

(G)  /(a)  =  Min  7' (y,  a/). 

.»/  >  * 

Proof.  The  proof  follows  very  familiar  lines.  For  each  n  2>  1,  let 
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y«  =  yn  ( x )  be  a  value  of  y  for  which  T  (y,  x.fn)  attains  its  minimum. 
Since  /,  (x)  is  continuous  by  assumption,  we  see  inductively  that  each 
element  of  the  sequence  is  continuous.  Since  T  ( oo ,  x,fn)  —  oo,  the 
minimum  is  attained. 

We  have  then, 

(7)  f»+i  =  T  (y„,  x,f„)  <,  T  (y„  _  i,  x,fn ) 

fn  =  T  (y„  _  i,  x,f„  -  i)<.  T  (y„,  x,f„  _  i) 

Combining  these  inequalities  in  the  usual  way,  we  obtain 


(8) 

1  fn  +1  fn 

|  ^  Max  {  |  T  (y„.  x,f„)  —  T  fyn, 

,  X.fn 

-1)1. 

1  T 

(y»  -  1. 

X.fn)  —  T(y„-  1.  X,fn  - 

l)  1} 

or 

(9) 

1  /»  + 1  — /»  1 

Max  [a 

J  fn  (y>>  -  s) - fn  -  1 

(yn  - 

-  s)  |  <f  (s)  ds 

+  « 

<  /»  (O)  - 

~fn  -  1  H  —  <f  (■>)  ds. 

0 

j"  fn  (}'»  -  1 - s) - fn 

- 1  ()’" 

—  s )  |<p(s)rfs 

a 

\fn  (0)  fn  -  1  (0)  |  n 

Jl/„  - 

qp(s) 

1 

ds  } 

Hence 

(10)  Max  |/„+i(x) — fn(x)\<,a  Max  |/„  (x)  —  /„  -  i  (x)  |  f  <p{s)ds 

O  <  jr  <  oo  0  <  j-  <  oo  Jo 

<[  a  Max  |  /„  (x)  —  /„  -  i  (x)  \. 

0  <  X  <  oo 
oo 

Thus  the  scries  ~  (/„  +  j  (x) — fn{x))  converges  uniformly  in  a  finite 

II  —  O 

interval  for  all  x  0,  and  /„  (x)  converges  to  /  (x)  for  all  a  0.  Since 
each  /„  (x)  is  continuous,  f  (x)  is  also  continuous. 

To  prove  uniqueness,  let  F  (x)  be  another  solution  which  is  uniformly 
bounded  for  x  2>  0.  Using  the  same  technique  as  above  for  the  two  equa¬ 
tions 

(11)  F  (x)  =  Min  T  (y,  x,  F) 

u  >  * 

f(x)  —  Min  T  (v,  x,f) , 
y  ^  * 

we  readily  show  that  F  (x)  —  /  (a)  is  identically  zero.  The  case  where  Min 
is  replaced  by  Inf  in  (1)  is  again  handled  by  an  approximation  process. 
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Finally,  let  us  note  that  if  we  take 

(12)  /,  (*)  =  Min  [A  (y  —  x)  +  a  f  p(s  —  y)<p  (s)  ds] 

*  J  v 

ft  ( x )  =  Min  [ T  (y,  jr,/,)] , 

y>* 

and  so  on,  we  obtain  monotone  increasing  convergence,  since  ft  ( x )  2> 
ft  (x),  and  hence  inductively,  fn  + 1  (x)  ;>  fn  ( x )  for  all  n. 

On  the  other  hand,  we  may  also  obtain  monotone  convergence  by  ap¬ 
proximating  in  policy  space.  We  may  set  y  =  x  for  all  x  ;>  0  and  obtain 
as  oiir  first  approximation 


(13)  /»(*)  =  «£  P(s  —  x)<p(s)ds  -f  a  —  s)q>ds 

+  <*ft  (0)  £  <p  (s)  ds 


for  x  ;>  0. 

This  equation  is  a  ‘‘renewal  equation”  whose  solution  we  shall  discuss 
in  an  appendix  to  the  following  chapter. 

Determining  /,  ( x )  by  means  of  the  equation 


(!*)/*(*)  =  Min[*(y  —  x)  +  a  f  p(s— y)<p{s) ds-\-af t(0)  f  9t>(s)rfs  + 
v  >  x  Ju  Jv 

a  /,  (y  —  s)  <v  (s)  rfs] , 


it  follows  that  ft(x)  <,ft  (*).  We  thus  obtain  monotone  decreasing  conver¬ 
gence  if  we  set 

(15)  fn  + 1  (x)  =  Min  T  (y,  x,  /„) . 

v  >  * 


Exercises  and  Research  Problems  for  Chapter  IV 

1.  Determine  the  structure  of  the  optimal  policies  associated  with  the 
functional  equation 

f(p)  =  Max  [R  (p,  q)  +f(T(p,q))] 

i 

under  the  assumption  that  R  (p,  q)  and  T  (p,  q)  are  convex  functions  of 
p  and  q,  and  that  R  (p,  q)  and  T  {p,  q)  are  monotone  increasing  functions 
of  p  for  each  q. 
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2.  Carry  through  the  details  of  an  existence  and  uniqueness  theorem  for 
the  system  of  equations 

ft  (p)  =  Max  [g,  (p,q)+  £  f  f,  (r)  dG<}  (fi,  q,  r)],  .=  1,2 . N. 

1  j _  |  JrtD 

3.  Show  that  we  obtain  an  equation  belonging  to  this  class  if  we  add  to 
Problem  45  of  chapter  I  the  further  condition  that  at  any  stage  there  is  a 
probability  px  t  hat  the  tradein  value  will  be  ruled  by  the  function  /,  (x)  and  a 
probability  pt  —  1  —  px  that  it  will  be  ruled  by  the  function  tx  ( x ). 

4.  Consider  the  multi-dimensional  process  where  the  resources  at  any 
stage  are  measured  by  the  non-negative  vector  p.  At  each  stage  p  is 

r 

divided  into  r  non-negative  vectors  qup  —  £  qt.  As  a  result  of  this  allo- 

i  -  i  it 

cation,  we  obtain  a  return  R  (q)  =  R  (qj)  and  assume  a  cost  of  £  (c},  qj). 

i  -  i 

Here  (c,  q)  denotes  the  inner  product  of  the  two  vectors. 

Let  Fn  (z)  denote  the  cost  incurred  obtaining  a  total  return  of  z  in  N 
stages,  employing  an  optimal  ;x>licy.  Show  that 

.F,  (*)  =  Min  £  (c,,  q,) , 
i  -  i 

q>  0 

Fn  + 1  (z)  =  Min  [  £  ( c} ,  q,)  +FN{z  —  R  (?))] . 

»  >  o  i  -  l 

5.  Under  what  conditions  does  the  limiting  equation 

F  (z)  =  Min  [  £  (cj,  qj)  -f  F  (z  —  R  (/>))],  F  (0)  =  0, 

9  >  »>  /  -  1 

have  a  solution  ? 

6.  How  can  the  following  problem  be  formulated  mathematically  ?  We 
are  lost  in  a  forest  whose  shape  and  dimensions  are  precisely  known  to  us. 
How  do  we  get  out  in  the  shortest  time  ? 

7.  Consider  the  case  where  the  “forest”  is  the  region  between  two  parallel 
lines.  (Gross). 

8.  Generalize  the  result  of  Theorem  5  by  considering  processes  in  which 
we  have  either  a  denumerable  number  of  different  transformations  at 
each  stage,  or  a  continuum  of  transformations. 

9.  Consider  the  still  more  general  process  where  there  are  a  denumerable 
number,  or  continuum,  of  states. 
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and,  for  0  <  n  <  1 


=  Max 

w  >  (I  L 


b  —  (1  —  n)  ( m/m )"/<"  -1 
1  +  au 


14.  Show  that  if  <p  (x)  is  strictly  convex  and  differentiable,  we  have 
<T  (x)  =  Max  [<p  (u)  —  (m  —  x )  <p'  («)] . 


and  if  concave, 


<P  (*)  =  M'n  [<p  («)  —  (m  —  x)  <p'  (u)  ] . 


Give  both  analytic  and  geometric  proofs. 

15.  Consider  the  multi-dimensional  analogue, 

/  £<p  t<p\ 

<p  (x„  xt)  =  Max  («„  I/,)  —  (m,  —  x,)  -  ■  —  {«,  —  xt)  —  I , 

\  <ui  cui / 

for  convex  functions,  and  the  corresponding  result  for  concave  functions. 

1(5.  Discuss  the  possibility  of  using  these  results  to  obtain  explicit  solu¬ 
tions  for  non-linear  systems  of  the  form 

9.  (v„  -Vt)  =  v,.  (.v„  -v,)  =  xt. 

where  and  are  both  concave  or  both  convex. 

17.  Newton’s  method  furnishes  a  sequence  of  successive  approximations 

A  n  +1  =  A  n  /  (  v n )  jf  (  t'n) 

to  the  solution  of /  (  v)  —  0.  Show  that  if  /'  (x)  >  0  in  [  a,  b  ]  and  also  f"  ^x) 
>  0  in  this  interval,  we  have 

a  =  Min  [y  —/(>’)//'  (>’)] . 

tt  -  y  <  h 

fora  root  in  this  interval. 

Obtain  corresponding  expressions  for  the  multi-dimensional  case. 


18.  Consider  the  two  equations 
(a)  v  (p)  =  L  (r,  p,q)  +  a  (/>,  q), 

(Id  u  (p)  =  Max  L  (u,  p,  q)  - f  «  (/>.  q)}. 

i 

where  u  ip)  is  a  scalar  function  of  a  vector  />,  belonging  to  a  region  R, 
and  <j  a  vector  variable,  belonging  to  a  set  i>  which  may  or  may  not  depend 
upon  />. 
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Assume  that 

(1 )  There  is  a  unique  solution  of  (a)  for  any  fixed  q  —  q{p),  denoted  by 
v  (p,  q),  for  p  in  R. 

(2)  There  is  a  unique  solution  of  (b)  for  p  in  R. 

(3)  If  w{p)  ^  L  (w,  p,  q)  +  a  ( p ,  q)  for  a  fixed  q  =  q(p), 
then  w(p)  (p,  q). 

Prove  that'under  these  assumptions  we  have 

«(/>)  =  Max  v  {j>,  q) 

r 

19.  Under  what  assumptions  concerning  the  matrix  A  (p,q)  =  (ay  {p,  y)), 
can  we  determine  the  solutions  of  the  systems 

y 

ut  (p)  =  Max  [at  ( p ,  q)  +  1  ai}  (p,  q)  u,  (/>)],  i  =  1,  2,  . . N, 

t  I  -  i 

or 

-v 

Max  [at  (p,  q)  u,  (p)  +  £  ay  (p,  q)  u,  (/>)]  =  ct,  *  =  1,  2,  . . N, 

1  i  -  1 

in  the  above  fashion  ? 

20.  Let  F\  ( x )  =  Gi  ( x )  =  x,  and 

Fn  (Xi,  X2,  . . .,  Xn)  =  Max  (xi,  G»  _  i  (x2 . x»)  ), 

Gn  (*i,  X2,  ....  x„)  =  Min  (Xi,  F„  _  i  (x2,  . . . ,  x„)  ). 

Prove  that 

lim  f  ...  I  Fn  (xi,  x2,  .  . . ,  x„)  dx\  dxz  . . .  dxn  =  n  V3/q, 
rt  oo  J  o  Jo 

lim  f  ...  f  Gn  (xi,  X2,  .  .  . .  Xn)  dxi  dx2  . . .  dxn  =  1 — 7lV3/q. 
n  -*  oo  J  °  Jo 

(Gross-Wang,  Amer.  Math.  Monthly,  Vol  63  (1956),  p.  589). 

21.  Let  the  y<  be  independent  random  variables  assuming  the  values 
1  with  probability  p  and  the  value  0  with  probability  1  —  p.  Let  the  r.i 
be  a  set  of  positive  quantities.  Set 

g.v  (x;  x()  =  Prob  j  £  xt  yt  /  £  x<  ;>  x  |  , 

and  fN  (x)  =  Inf  gN  (x;  x() 
x< 
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Show  that 

f"  M  X  s  [**”  (t -7^)  +  (*-«/»-•  (rr^)]  • 

and  thus  obtain  a  non-trivial  uniform  lower  bound  for  gjv  (a:;  *1).  (Harris) 

22.  Under  what  conditions  does  there  exist  a  unique  solution  of  the 
equation 

x 

u  (x)  —  Min  E  Pi  (x)  u  (x  +  an),  0  <  x  <  C, 
i  1  -  1 

M  {*)  =  0,X<,  0, 

u  (x)  =  1,  x;>  C, 
where,  for  0  <  x  <  C, 

(a)  p}  (x)  ^  0 

(b)  E  p,  (x)  =  1. 

/  -  1 

Consider  the  case  where  x  assumes  only  a  discrete  set  of  values,  {kA}, 
and  atj  —  m(jA,  where  ttin  is  a  positive  or  negative  integer. 

23.  Consider  problem  15  in  the  exercises  at  the  end  of  Chapter  3.  Show 

that  the  problem  of  determining  minimum  cost  is  equivalent  to  the  problem 

-v 

of  determining  the  minimum  of  Ls  (x)  =  E  xt  subject  to  the  constraints 

*  -  1 

a.  x<  ^  0, 

b.  x*  4-  Xk  +  1  +  •  •  •  +  xt  +  h  ;>  «*,  k  =  1 ,  2,  . . . ,  N, 
where  xN  +  *  =  x*. 

( Management  Science,  1957). 

24.  Consider  the  more  general  problem  of  determining  the  minimum 
of  Ln  (y)  =  yi  +  yi  +  ...  +  yx  subject  to  the  constraints 

(a)  yi  >  0 

(b)  yi  ^  r,  yN  ^  s, 

(c)  yi  +  y2  ^  bx, 
y-i  4-  y3 63. 

yx  - 1  4-  yN  2>  bx  _  1. 
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Write,  for  fixed  r,  fN  (s)  =  Min  L  (y),  N  2>  2.  Show  that 
ft  (s)  =  Max  (s  4  r,  &i), 

f v  (s)  =  Min  [z  4/v  -1  (6*  -I  — z)  ]. 

*  >  »• 

where  s*  =  Max  (s,  0). 


Show  that 

ft  (s)  =  Max  (s  -f  uk,  vt),  k  =  1,  2,  ...  , 
where  ut  and  v*  are  functions  of  r. 


25.  Show  that 

Uk  =  Max  (r  4  at,  fit), 
vt  —  Max  (r  4  yk,  dk), 

for  k  3,  where 


at  +  i  =  y*. 
ftk  +  1  =  <5*. 

y*  +  i  =  Max  (a*  4  bk,  yk), 
dk  +  i  =  Max  (ftk  +  bk,  dt). 


y 

26.  Consider,  in  like  fashion,  the  problem  of  minimizing  Ly  (x)  —  27  Xi 

i  -  1 


subject  to  the  constraints 


a.  x(  ;>  0, 

b.  xi  i>  *. 

c.  -ri  4  xz  ;>  y, 

d.  Al  +  x2  4  x3  7>  &i, 

xs  4  x3  4  *4  /  k>. 


A.v  -2  4  *.V  -1  4  A.v  ^  by  _2, 

•  A.v  -1  4  *.V  >  S, 

A.V  7>  r. 

X 

27.  Consider  the  problem  of  minimizing  Ly  (x)  =  27  Ct  Xt  subject  to  the 

» -  i 

constraints 
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a.  X,  ^  0, 

b.  611  Xi  -f  bn  Xt  ;>  61, 

bn  xt  4-  bn  xt  ^  bt, 

bx.i  ,  jv  -  1  Xn  -  i  +  6n  -  i,  nXn  bs  -  i. 

c.  Xi  ;>  x,  x*  ^  r. 

Obtain  the  corresponding  functional  equation  and  the  analogues  of  the 
above  results  under  suitable  assumptions  concerning  the  coefficients  btj. 

28.  Let  us  suppose  that  we  are  given  a  map  containing  N  distinct 

locations  numbered  in  some  fashion  t—  1,2 . N,  and  a  matrix 

T  =  (it/)  telling  us  the  time  required  to  travel  from  i  to  j,  with  tu  =  0. 
Starting  at  the  first  location,  we  wish  to  pursue  a  route  which  minimizes 
the  total  time  required  to  travel  to  the  Nth  point,  using  any  of  the  other 
locations,  and  only  these,  as  intermediary  stops. 

Let  /<denote  the  time  required  to  go  from  i  to  N,  i  =  1,  2,  .v.,  N  —  1, 
fs  —  0,  using  an  optimal  policy.  Show  that 

fi  —  Min  [ttj  fi\,  i  —  1,2 . N  —  1. 

i 

29.  Show  this  equation  has  a  solution  {/(}  unique  up  to  an  additive 
constant. 

30.  Show  that  any  one  of  these  solutions  suffices  to  determine  the 
optimal  policy. 

31.  Consider  the  following  approximation  in  policy  space, 

/( <*>  =  It,  i  + 1  +  tt  + 1,  <  +  2  +  ...  -I-  In  -i.  x, 

for  i  =  1,  2,  ...,N —  1,  and  let  the  sequence  {//*>}  be  defined  by 

/(U-  +  D  =  Min  [/,;  4-  //*)],  *=1,2 . Ar  —  1, 

1 

k  =  1,2 . 

Show  that  the  vectors  converge  to  a  solution  of  the  above 

functional  equation,  and  thus  may  be  used  to  determine  optimal  policies.* 

2.V  2.V 

32.  Consider  the  problem  of  maximizing  27  gt  (xt)  subject  to  27  xt  c, 

i  -  1  i  -  1 

xt  7>  0.  Show  that  this  is  equivalent  to  maximizing  fN  (y{)  +  h.\  (y2) 
subject  to  yi,  y2  >  0,  yi  +  y2  =  c,  where 

*  It.  Bellman,  A.  Routing  Problem,  Quarterly  of  Applied  Mathematics,  1957. 
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fs  (yi)  =  Max  Z  g,  (*i). 

A,  i  -  1 

SAT 

hs  (y*)  =  Max  Z  gt  (*<), 

n,  i  -  ,v  +  i 

and  Ri,  Rt  are  defined  by 

>• 

Ri  ‘  Z  xt<.  yi, 

<  - 1 
ZN 

J?*:  *i  Si  0,  Z  xt<.  y*. 

i  -  N  +  1 

What  computational  advantages  are  there  in  employing  this  technique 
and  its  natural  extension  ?  Discuss  the  multi-dimensional  case. 

33.  A  gambler  receives  advance  information  concerning  the  outcomes 
of  a  sequence  of  independent  sporting  events  over  a  noisy  communi¬ 
cation  channel.  We  assume  that  the  outcome  of  each  event  is  the  result 
of  play  between  two  evenly  matched  teams,  and  that  p  is  the  probability 
of  a  correct  transmission,  and  q  =  1  —  p,  the  probability  of  incorrect 
transmission. 

Assuming  that  the  gambler  starts  with  an  initial  amount  x,  and  bets 
on  the  outcome  of  each  event  so  as  to  maximize  his  expected  capital  at 
the  end  of  N  stages  of  play,  show  that  he  wagers  his  entire  capital  at 
each  stage,  provided  that  p  >  1/2,  and  nothing  if  p  <  1/2. 

34.  Let  us  assume  that  the  gambler  plays  so  as  to  maximize  the  expected 
value  of  the  logarithm  of  his  capital  after  N  stages.  Assuming  that 
he  uses  the  same  betting  policy  at  each  stage,  determine  this  ratio  of  the 
amount  bet  to  the  total  capital. 

(J.  Kelly,  "A  New  Interpretation  of  Information  Rate,’’  1956,  Symposium 
on  Information  Theory.  Transactions  /.  R.  E.  1956,  pp.  185-189). 

35.  Let  us  assume  that  the  gambler  plays  so  as  to  maximize  the  expected 
value  of  the  logarithm  of  his  capital  after  N  stages.  Let  fs  ( x )  denote 
the  expected  value  obtained  using  an  optimal  policy.  Show  that 

fs  + 1  {x)  =  Max  [pfN  (x  +  y)  +  qfN  (x  —  y)],  w  =  1,  2,  . . .  , 
o  <  »  <  * 

assuming  that  there  are  equal  odds,  with 

/i  {x)  =  Max  [p  log  (x  -f  y)  +  q  log  (*  —  y)  ]. 

0  <  y  <  x 

(For  this  and  the  following  results,  see  R.  Bellman  and  R.  Kalaba,  “On 
the  Role  of  Dynamic  Programming  in  Statistical  Communication  Theory”, 
Transactions  I.  R  E.,  1957. 
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36.  Show  inductively  that 

fs  ( x )  =  log  x  +  N  k, 

where 

k=  Max  [/» log  (1  +  r)  +  ^  log  (1  —  r)  ]. 

0  <  r  <  1 

and  hence  that  there  is  a  number  r„  such  that  the  optimal  policy  at  each 
stage  is  determined  by  the  relation  y  =  r0x.  - 

37.  Consider  the  time-dependent  case  where  the  probability  of  correct 
transmission  depends  on  the  stage.  Establish  the  corresponding  functional 
equation  and  deduce  the  structure  of  the  optimal  policy. 

38.  For  the  case  where  the  purpose  of  the  process  is  to  maximize  the 
expected  value  of  the  return,  or  the  logarithm  of  the  return  after  N 
stages,  the  above  analysis  shows  that  the  optimal  policy  is  independent 
of  the  quantity  of  resources  available  at  each  stage. 

Consider  the  problem  of  determining  the  class  of  criterion  functions 
possessing  this  property.  Let  <p  (x)  be  a  monotone  increasing  concave 
function  defined  over  0  <;  x  <  oo,  normalized  by  the  condition  <p'  (1)  =  1 
and  consider  the  one-stage  process  where  we  wish  to  maximize 

£  (y)  =  P  (*  +  y)  +  (i  —  P)  <p  (*  —  y) 

for  0  <,  y  <;  x,  where  1  ^  p  >  1  /2.  Show  that  if  for  all  x  >  0,  there  is 
a  maximum  of  the  form  y  —  r  (p)  x,  then  we  must  have 

yk  +  1 

=  a+T  + Cuk  >  —1> 

or,  as  an  extreme  case, 

9?  (y)  =  log  y  +  ci. 

39.  Consider  the  case  where  successive  signals  are  not  independent.  Let 

the  probability  of  a  correct  transmission  at  the  kth  stage  depend  upon 
the  transmission  of  the  signal  at  the  (k  —  1)"  stage.  Define,  for  x  >  0, 
k  —  1,2 . .V, 

fk(x)  —  expected  value  of  the  logarithm  of  the  final  capit.  1  obtan  ed 
from  the  remaining  k  stages  of  the  original  N-stage  process, 
starting  with  an  initial  capital  x,  the  information  that  the 
(k  —  1)*‘  signal  was  transmitted  correctly,  and  using  an  optimal 
policy. 

gk  ( x )  =  the  corresponding  function  in  the  case  where  the  (k  —  1)*( 
signal  was  transmitted  incorrectly. 

Then 

/*  (.v)  =  Max  [p.\  _  *  *  i/»-  _i  {x  +  y)  +  (1  —  p *  -  k  +i)  g*  -i  {x  —  y)  ], 

U  <  r  <  1  . 
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g*-  (x)  =  Max  [  r.v  _  *  + 1  /*  _  i  (x  -f  y)  +  (1  —  rv  _  *  + 1)  g*  - 1  (x  —  y)], 

ii  »■  »  <  / 

where 

pi  =  probability  of  correct  transmission  of  the  k,h  signal  if  the 
(k —  1)*‘  signal  was  transmitted  correctly. 

qi-  —  probability  of  correct  transmission  of  the  kth  signal  if  the 
(k —  1)*‘  signal  was  transmitted  incorrectly. 

Show  that  /*■  ( x )  =  log  x  +  at,  g*  (x)  =  log  x  -f  bi.  Determine  a*  and 
bt  and  the  structure  of  the  optimal  policy. 

40.  Consider  the  situation  in  which  the  channel  transmits  any  of  M 
different  symbols.  Upon  receiving  a  symbol  the  gambler  must  make 
bets  on  what  he  believes  the  transmitted  signal  actually  was.  Assume 
that  the  gambler  possesses  the  following  information: 

pi)  =  the  conditional  probability  that  the  /-signal  was  sent  if  the 
t-signal  is  received. 

qi  =  the  probability  of  receiving  the  t-signal. 
r}  =  the  return  from  a  unit  winning  bet  on  signal  /. 

Assume  that  the  gambler  is  free  to  bet  an  amount  zt  on  the  ith  signal, 

subject  to  the  restriction  that  Xzi<,x.  Defining  the  sequence  {/.v  (x)} 

i 

as  above,  show  that 

.»/  r  .1/  .1/ 

fs  (  v)  =  2’  q,  Max  2’  p,j  fs  _  i  (r,  z,  +  x  —  2’  z,) 

i  —  1  I'tf  <r  U  -  1  *  —  1 

zt  >  0 

f\  (.v)  2  q>  Max  [  2'  pa  log  (rye)  x  —  2’  z,) 

*  •*  1  l'zi  <  Jr  Lj  —  1  *  l 

z,.  I) 

Prove,  as  before,  that  fs  (a)  log  a  4-  Xai,  determine  a*  and  the 
structure  of  the  optimal  policy.  Show  that  the  optimal  policy  is  in¬ 
dependent  of  the  qt. 

41.  Consider  the  case  in  which  there  are  a  continuum  of  different  signals. 
Let  dG  (m,  v)  conditional  probability  that  a  signal  with  label  between 
v  and  v  -f-  di  is  sent  if  the  n-signal  is  received. 

>IH  (a)  —  probability  that  a  signal  with  label  between  it  and  u  +  dit 
is  received  at  any  stage. 

Show  that  the  corresponding  functional  equations  are 

fs  (v)  =  I  [  Max  I  fs  -i  (2s  (r)  )  dG  (i t,  v)  1  dll  (it), 

.1—00  L  z  ( r)  —  •*>  J 


X  ^2, 
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/i  (a)  —  f  [  Max  f  log  (2z  (v)  )  dG  ( u .  v)  ]  dH  (u), 

J-ao  [_  i  (r)  J-M  J 

assuming  for  the  sake  of  simplicity  that  the  odds  are  even,  and  that 
all  money  must  he  bet.  The  maximization  is  over  all  functions  satisfying 


a.  z{v)  2>  0, 


z  (v)  dv  —  x. 


Obtain  the  form  of  fN  ( x )  and  the  structure  of  the  optimal  policy. 


42.  Consider  the  case  where  p  itself  is  a  random  variable,  subject  to  a 
known  probability  distribution. 


43.  Consider  the  case  in  which  the  probability  distrinution  is  unknown. 
We  do,  however,  have  an  a  priori  estimate  dG  (p),  and  agree,  after  k 
successful  transmissions  and  /  unsuccessful  transmissions,  that  the  new 
<i  priori  estimate  is  to  be 


dGtsi  (p)  = 


Pk(\  —  P)dG  (p) 


p)dG(p) 


14.  Several  industrial  plants  are  located  along  a  river,  numbered  from 
north  to  south,  1,2,  ...,N.  A  certain  quantity  of  water  flows  down 
this  river,  to  be  allocated  along  the  way  to  these  plants.  Assume  to 
begin  with  that  water  allocated  to  a  plant  cannot  be  used  by  any  other 
plants,  and  determine  the  allocation  policy  which  maximizes  the  return 
to  the  community.  (W.  Hall) 

45.  Consider  the  same  problem  under  the  assumption  that  a  certain 
quantity  of  the  water  allocated  to  each  industry  returns  to  the  river, 
sometimes  immediately,  and  somet'mes  several  stages  further  down. 

(W.  Hall) 

4<>.  Suppose  that  the  waste  products  of  each  industry  pollute  the  water, 
an4-thc  cost  of  using  this  water  depends  on  the  pollution  level.  Determine 
the  optimal  allocation  policy  in  this  case.  (W.  Hall) 

47.  Suppose  that  the  quantity  of  water  available  is  seasonal,  and  that 
the  demand  is  seasonal.  Dams  exist  at  various  places  along  the  river 
where  water  can  be  stored.  Determine  the  optimal  allocation  police. 

(\\ .  Hall) 

48.  There  arc  n  different  industrial  plants  whose  construction  along  a 
river  is  being  considered.  The  i,h  plant  has  production  value  r,-,  discharges 
waste  products  in  quantity  uu  into  the  river,  and  has  a  tolerance  level  ii, 
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which  if  tin/  's  *°  l)c  utilised,  must  exceed  the  sum  of  the  wastes 

from  the  ul'  ,n',m  pla,ds.  We  wish  to  choose  a  subset  of  the  n  plants 
*  I  he  river  so  as  to  maximize  the  economic  value  of  the 

(L.  M.  K.  Boelter) 

Show  that  MK  is  a  maximization  problem  over  2"nl.  choices,  which 
can  be  reduced\\V”  M  ~  '  C,U,icCS-  (Gross-Johnson) 

49  Show  that  anv  'Vtimal  so,,,tion  can  be  reordered  by  increasing 
•tU  ^\t  loss 

/it hi p  w" 

(O.  Gross-S.  Johnson) 

50.  If  =  1  for  t  =  1,  2 . ,.;Khow  that  an  optimal  solut,on  may 

be  found  using  the  following  proceJ 

a.  Order  and  renumber  the  items  accNiL'g  to  the  magnitude  of 
U  +  wt. 

i 

b.  Compute  s<  =  £  wt,  and  —  It — s<  _i. 

i  -  i 

If  dk  <  0  is  the  first  violation,  delete  an  item  in  the  set  i 
whose  Ui  is  largest. 


values  of  U  +  withov>,  °f  °Ptimality,  and  thus  that  there  are 

fewer  than  2"  cases  to  co.i  ^jcr’ 

^  -v 


c. 


d.  Recompute  as  in  step  (b)  for  the  new  set,  and  repeat  steps  (b) 
and  (c)  until  all  violations  are  removed.  (O.  Gross-S.  Johnson) 


51.  Show  that  in  the  general  case  an  optimal  solution  has  no  greater 

number  ot  items  than  there  are  in  the  optimal  solution  of  the  same 
problem  will.  all  the  equal.  (O.  Gross-S.  Johnson) 

52.  Consider  the  proi/'m  of  finding  an  approximate  solution  of  the 
equations  /(*,  y)  =  a,  g(\,  •'  =  b.  Let  {**,  y*},  k  =  0,  1,  2,  ....  be  a 
sequence  of  guesses,  and 

dN  =  ( h  (**,  yN)  —  a)*  -f  (g  f.t.v,  v  v)  —  b)*. 

Assuming  that  xo  =  ci,  yo  =  c2,  and  that  (xt  -  .'<)*  +  (y<  +  i  — -  y<)* 
<  r1,  for  t  =  0,  1,  2,  . . .,  let  for  N  =  0,  1,  2,  ... 


Show  that 


Jn  (ci,  c2)  =  Min  dN. 

{zi-  »i} 


j n  +  i  (ci,  C2)  =  Min  [/*■  (*i,  yi)  ], 
n 


where  R  is  the  region  determined  by  (*i —  c\)%  J-  (yi  —  c2)’  r*. 
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53.  Set  x\  =  ci  r  cos  6,  yk  ==  c*  +  r  sin  0  and  assume  that  r  is  a  small 
quantity.  Then 

fs  +  i  (ci,  ct)  ~  Min  [/j v  (ci  +  r  cos  0,  Cj  -|-  r  sin  0)  J 

=  Min  [fs  (ci,  ct)  -f-  r  [cos  0  c/W/t)er  +  sin  0  Ofs/tift]  ]. 

» 

*rom  this  determine  approximate  values  for  cos  6  and  sin  0.  What  is 
the  connection  with  the  classical  gradient  method  ? 

54.  Consider  the  problem  of  determining  the  Cebycev  norm 

JV 

ds  — -  Min  Max  |  /  (x)  —  £  a  xk\  . 

<V  II  ^  T  <  1  z  —  0 

Discuss  the  convergence  of  the  following  scheme.  Let  {c0*}  be  an  initial 
approximation,  and  c0'  determined  as  tlic  minimum  of 

.v 

Max  |  f(x)  —  c0  —  £  c°k  xk\  . 

o  <  x  <  1  *  -  1 

The  let  c\  be  determined  as  the  minimum  of 


Max  |  /  (x)  —  co' 

o  <  x  <  1 


CiX 


x 

-  £  c°k  xk\ 

k  -  2 


and  so  on. 


55.  Suppose  that  we  wish  to  send  a  rocket  to  the  moon.  Since  there  are 
questions  of  cost  and  engineering  involved  in  carrying  large  quantities 
of  fuel,  and  the  containers  for  large  quantities  of  fuel,  we  attempf  to 
cut  down  on  the  quantity  of  fuel  required  and  the  size  of  the  rocket  by 
building  a  multi-stage  rocket  of  the  following  type: 

!♦— Nose  Cone— *J 


Third 

l^^^Secood 

Firet 

Stoge 

Stoge 

1 

Sub -Rocket  I 


u.N 


Sub- Rocket 
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After  the  fuel  carried  in  the  last  stage,  the  A'*  stage  is  consumed,  this 
stage  drops  off,  leaving  a  (A  —  1)  stage  suh-rocket,  and  so  on. 

The  problem  is  to  build  a  A-stagc  rocket  of  minimum  weight  which 
will  attain  a  final  velocity  of  v.  Let 

\V*  =  initial  gross  weight  of  sub-rocket  A. 
w i-  =  initial  gross  weight  of  stage  k. 
pk  =  initial  propellant  weight  of  stage  k. 

Vk  —  change  in  rocket  velocity  during  burning  of  stage  k. 
Assume  that  the  change  in  velocity  Vt  is  a  known  function  of  Wk  and 
pk,  so  that  i'k  =  v(Wk,pk)  and  thus  pk  —  p  (If7*,  v*).  Since  IV t  = 
Wk  -  l  +  J Vk,  and  the  weight  of  the  A"1  stage  is  a  known  function,  g  (pk), 
of  the  propellant  carried  in  the  stage,  we  have 

IV k  =  £  (P  (Wk  -1  -I-  Wk,  Vk)  ), 
whence,  solving  for  Wk,  we  have 

u’k  =  w  (IT*  _  i,  Vk). 

Let  fk  ( v )  denote  the  minimum  weight  of  sub-rocket  A  achieving  a 
terminal  velocity  of  v.  Then 

fk(  v)  =  Min  [«•(/*  _i(v —  Vk),vk)  fk  -  i(v  —  vk)  ], 

**  <  rt  <  r 

for  A  2;  2,  with 

fo  (v)  =  H’o  =  weight  of  nose  cone 

/,  (v)  =  Min  (tv  (Ho.  ro)  4-  H'o). 

0  <  r,  <  i! 

(R.  P.  Ten  Dyke) 

56.  Consider  the  problem  of  maximizing  the  linear  form 

3.V 

Lx  (a)  —  2'  xi  over  all  non-negative  xt  satisfying  the  constraints 
f  -  i 

It 11  Xl  -f-  (t 1 2  Xi  -f-  «i3  A' 3  Ci, 

<721  Xl  -f-  It 22  A 2  -f-  It 23  A.1  C2, 

a 31  X 1  -A  It  32  A 2  4  <*33  A 3  4  bl  A  4  f  3, 

<*«« *«  -f  ati  a +  a4,  v,  <,  r4, 

+  <ii5  A5  +  <iie  A.  <;  r5, 

*4  4-  ««J  Aj  4-  </„  V,  4-  b-2  A' 7  ^  f„ 

rt.1V  -  2,  3.V  v1.V  2  4  It 3.V  -  2.  3.V  1  V3.V  1  4*  rt3.V  -  2,  3  V  A'S.V  C 3.V  -  2, 

<1 3.V  1,  3.V  -  2  A3A  (*3.V  -  I,  3.V  1  A'S.V  _  1  f  <*3.V  _  1,  3.V  A'3.V  <Z  C 3.V  -  1, 

it 3.V,  3.V  2  A'S.V  2  *  4*3  ,  t  »  V3.V  1  4  «3.V,  3.V  A3.V  <1  flV, 

and  A(  >  0,  where  >  0,  f 

MG 
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Define  the  sequence  of  functions 

fs  (z)  =  Max  Ls  ( x ), 

where  tlic  x>  are  subject  to  the  constraints  given  above  with  the  exception 
that  the  last  constraint  is  now 

‘lax,  3.V  -  2  *3.V  _  2  +  fl3.v.  3.V  -  1  *3.V  -  1  4*  «3.V,  3.V  X3X  Z. 

Show  that 

fx  (z)  —  Max  [X3.V  _*  +  Xax  -  1  4*  *3.v  4*  fs  -  i  (cax  -3  —  bx  -  1  Xax  _a)], 
K.v  ■r*.v  -  •.•'•Vi 

Ar  ^  1. 

where  xax  -  a,  As.v  -  i,  xs.v  are  subject  to 

II ax  -  2,  3.V  -  2  Xax  -  2  4  «3.V  -’2,  3.V  -  1  *3.V  -  1  -f-  «3.V  _  3.v  A*3.v  CaX  -  2. 

‘lax  —  1 1  3.v  —  2  X3.V  -2  4*  flax  ~i,3X  -1  *3.v  -1  4*  <*3.v  1 .  ax  Xax  Cax  -  1 , 

Hax ,  ax  -2  Xs.v  -2  4*  a ax,  ax  -  i  Xax  -  1  4*  a3X,  3X  Xax  z, 
bx  _  1  Xax  -  2  5S  j.v  -  3,  Xt  0. 

The  function  /o  (z)  is  taken  to  be  identically  zero. 

57.  Obtain  corresponding  results  for  the  case  where  the  matrices  are  of 
different  order. 

58.  Consider  the  case  where  the  3A">  equation,  k  —  1,2,  ....  has  the 
form 

dak  -  2,  3  V  -  2  Xa k  -  2  4-  ‘lak  -  2,  3*  -  1  *3*  -  1  4-  da k  -  2,  3*-  *3*  4-  f>2  A'3»-  +  l 

4*  Ca  Xak  1 2  4*  <fs  A'3  k  43^  cat . 

59.  Show  that  the  above  functional  equation  can  be  reduced  to  the  form 

fx  (z)  =  Max  gx  (xax  -  2,  z)  4-  fx  -  1  (C3.V  -  3  —  bx  -  1  .V3.V  -  2)  ]  , 

■r*.V  - 1 

where  A3V-2  satisfies  an  inequality 

0  a’3,v  _  2  Min  fix ,  zjdax ,  ax  -2]  . 

(JO.  Consider  the  problem  of  resolving  a  set  of  linear  equations  of  the 
form 

(i  11  -Vi  *  (i  1 2  -V2  4*  a  1 3  A3  =  Ci, 

H21  A'i  4“  11 22  X*  4*  Uaa  Aa  C2, 

Qai  Xi  4  a 32  a"2  4  a 33  A3  4  b  1  x4  —  C3, 

bi  .r3  4-  Hu  x,  4*  ‘Us  Xi  4-  <1  tt  xt  =  c4. 
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a$*  xi  4*  xt  +  <*»«  -  fi. 

a»* x*  4"  <•«»  xt  -f-  a%%  xt  4 ~  bt  Xj  =  c, 

fry  —  |  *3y  _  S  4*  <*1  +  SAT,  1  *  3N  Xi  +  3.V  4-  <*l  +  SN,  2  -  SN  X2  +  3y 

4-  «I  +  3N.  3  +  3.V  *3  +  3N  =  Cl  +  3y 
t>3  +  sy,  1  +  3N  Xi  +  sat  4~  <*2  +  SAT,  2  +  3.V  *2  +  3A' 

4-  <*2  +  3N,  3  +  3N  X3  +  sy  =  Cs  +  3y 
<*3  +  SAT,  1  +  3Af  Xl  +  3S  4"  <*3  +  3AT,  2  +  3A-  Xi  +  3y 

4"  «3  +  3N,  3  +  3AT  X3  +  3A'  —  C 3  +  3AT, 

where  ( a tj)  is  a  symmetric  matrix,  and,  in  addition,  positive  definite. 

Linear  systems  of  this  type  arise  in  the  study  of  multicomponent 
systems  where  there  is  weak  coupling  between  stages. 

The  problem  of  solving  this  system  is  equivalent  to  that  of  determining 
the  minimum  of  the  inhomogeneous  quadratic  form 

(x\  A ,  x»)  +  (x*.  Azx*)  4-  ...  4-  (*\  As  x*) 

—  2  (c»,  x')  —  2  (r‘,  x*)  4-  ...  —  2  (cN ,  xN) 

4-  2  X3  X4  4-  2  &2  Xs  X7  4-  ...  4“  2  bs  -  I  X3y  _  1  X3S  -  2 

where  the  vectors  x*  and  ck  are  defined  by 

Xk  =  (X3*  _  2,  X3k  _  1,  X3»),  Ck  =  (C3*  _  2,  C3*  _  1,  C34) , 

and  A  t  ~  (at  +  3t,  f  +  3*),  *,  /  =  1 ,  2,  3. 

Show  that  the  problem  can  be  reduced  to  that  of  determining  the 
sequence  {/v  ( r )}  defined  by  the  recurrence  relation 

fs  (z)  —  Min  [  (x-v,  As  xN)  —  2  zx3y  —  2  (c‘v,  xN)  + 

Ui.v.  »|,V  -  I  'i,y  -  1) 

4-  /a-  -  1  (fry  -  1  X3S  -  2)  ]• 

(Illinois  Journal  of  Mathematics,  1957). 

61.  Show  that  this  may  be  reduced  to  the  form 

Js  (z)  =-  Min  [gs  (z,  y)  4-/y  -  1  (fry  - 1  y)  ]. 

v 

where 

gy  (*.  y)  =  Min  [  (xv  As  -vv'>  —  2  v  —  2  (c*.  x*)  ]. 

<*.  v.  x's  ~  •' 

62.  Show  that  fs  (z)  =  «y  4  f a  -  +  u’y  z*,  where  «y,  ry,  ;ry  are  inde¬ 
pendent  of  2,  determine  the  recurrence  relations  connecting  («y,  ry,  ws) 
and  (wy  _  1,  vN  -  1,  ws  - 1),  and  thus  determine  the  solution  of  the  linear 
system. 
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63.  Consider  the  problem  of  determining  the  maximum  of 

.v  .v  —  i 

Qn  ( x )  =  E  (x‘,  A,x‘)  +  2  E  bt  x3i  xi  +  *< 

i  -  1  t  -  1 

.V 

over  the  sphere  Ss,  E  (*',  x()  =  1. 

•  -  i 

Consider  the  associated  functions  of  z  defined  by 

/jv  (z)  =  Max  [qN  (x)  +  2  zxjn]  , 

*s'.v 

and  obtain  the  recurrence  relation  connecting  fs  (z)  and  / n-i(z ). 

64.  Generalize  the  foregoing  results  to  the  case  where  the  matrices  At 
are  not  necessarily  of  the  same  dimension. 

65.  Obtain  existence  and  uniqueness  theorems  under  appropriate 
assumptions  for  the  following  functional  equations 

a.  f(p)  =  Min  Max  [g  (p,  q),/(T  ( p ,  q)  )  ] 

b.  f{p)  =  Min  Max  [  g  (p,  q),  h  (p,  q)f(T  (p,  q)  )  ] 

c.  f(p )  =  Min  Max  [  g  (p,  q),  r  (p,  q)  +  f  /(*)  dG  (z,  p,  q)  ]. 

q  JH 

66.  Consider  the  problem  of  assigning  m  different  types  of  machines  to 
n  different  tasks.  Let  At)  0  be  the  amount  of  task  j  performed  by  a 
unit  input  of  machine  i,  and  assume  that 

a.  If  Aij  >  0,  and  »'  <  t,  then  Ai}  0. 

b.  If  A it  >  0,  and  then  Atj.  >  0. 

c.  If  *  <  i',  j  <  /!(•;  >  0,  then 

(AiflA(f)  <  (A  fj'JA  (•;). 

Let  be  the  quantity  of  machines  of  type  i  to  be  used  for  task  j. 
The  matrix  x  =  ( xtf ),  i  =  1,  2,  . . .,  m,  j  =  1,  2 . ti,  is  said  to  be 

n  n 

feasible  if  xij  ^  0,  E  Aij  xij  =  T},  j  —  1,2,...,  n,  and  E  Xn  <,  Mi, 
i  -  l  i  -  1 

»  =  1,  2,  ....  m. 


Consider  the  following  policy.  Assign  xn  up  to  the  minimum  of  T i 
and  l\Ti.  If  xn  =  7’i,  then  assign  xi2  ==  min  (T2,  Mi  — *n),  and  so  on. 
When  Mi  is  used  up  in  this  way,  on  the  task  for  some  assign  X2j  in 
such  a  way  that  either  task  j  is  finished  or  all  machines  of  type  2  are 
assigned.  Complete  the  assignment  of  machines  in  this  way. 

Show  that  if  this  policy  does  not  lead  to  a  feasible  allocation,  then 
there  exists  no  feasible  policy.  (Arrow-Markowitz— Johnson) 
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07.  Show  that  the  above  policy  yields  the  solution  of  the  problem  of 

m 

maximizing  Tm  =  ~  Ai„xtn  subject  to 

i-i 

M 

a.  27  xij  =  Mi,  i  =  1,2,  . . . ,  m,  Xtj  0, 

*  l 


b.  27  /I  i/  xij  =  7 '/,/  ==  1,2,  . . . ,  »  —  1 , 

t  -  i 

provided  that  the  /!<>  satisfy  the  above  conditions. 


(Johnson) 


08.  Show  that  the  problem  of  maximizing  the  sum  27  gt  (.r<,  yt)  subject 

i  -  1 

to  the  constraints 

.v 

a.  xt  ;>  0,  27  x  i  =  x, 

i  -  I 
.V 

b.  yi  <,  0,  27  V(  =  y, 

i  -  i 

can,  under  appropriate  assumptions  concerning  the  functions  gt  (x,  y), 

be  reduced  to  the  problem  of  maximizing 

x  .v 

•S.v  =  27  gt  (xi.  yt)  —  A  27  y<, 

.  -  1  i  -  1 

subject  to  the  constraints 

.v 

a.  \  t  0,  27  xt  =  v, 


!>.  yt  ^  0. 

This  last  problem  leads  to  the  recurrence  relations 

Jn  (x)  Max  Max  g„  (x„,  y)  —  A  y]  +  /«  -  i  —  x„)  ]  , 

'»  '  rn  r  u  >  0 

involving  a  one-dimensional  sequence,  for  each  fixed  A. 

How  does  one  use  the  solution  of  this  second  problem  to  solve  the 
original  ?  (lJroc.  Sat  Acad.  Sci.,  19.56). 


09.  Each  year  the  walnut  crop  consists  of  walnuts  of  different  grades, 

say  (it,  (j  > . Cjk,  in  quantities  q i,  q-> . qn.  Using  various  quantities 

of  each  grade,  assortments  of  walnuts  are  put  together  for  commercial 
sale  at  different  prices.  Assume  that  there  arc  fixed  demands  dt  for  the 
i"‘  assortment,  and  that  each  assortment  mixes  walnuts  of  different 
grades  in  its  own  fixed  ratios.  How  many  packets  of  each  assortment 
should  be  made  in  order  to  maximize  total  profit  ? 


70.  Consider  the  case  where  the  demand  is  stochastic  with  known 
distributions  for  each  type  of  packet. 
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Bibliography  and  Comments  for  Chapter  IV 

§  I.  This  chapter  follows  It.  Bellman,  "Functional  equations  in  the  Theory 
of  Dynamic  Programming — I.  Functions  of  Points  and  Point  T  ransforma¬ 
tions,"  Trans.  .-Inter.  Math.  Soc.,  vol.,  Vol.  80  (1055),  pp.  51-71.  An  entirely 
different  treatment  of  a  more  abstract  type,  making  use  of  Tychonoffs 
Theorem,  is  contained  in  an  unpublished  paper  by  S.  Karlin  and  H  N. 
Shapiro,  "Decision  Processes  and  Functional  liquations."  The  ItAND 
Corporation,  RM-9.T3,  Sept.  1052. 

See  also,  S.  Karlin,  "The  Structure  of  Dynamic  Programming  Models,” 
Naval  Research  Logistics  Quarterly,  Vol  2  (1055),  pp.  285-204. 

§  6.  A  discussion  of  the  importance  of  stability  theory  in  the  domain  of 
differential  equations  may  be  found  in  K.  Bellman,  Stability  Theory  of 
Differential  liquations,  McGraw-Hill,  1052. 

§  8.  The  choice  of  /„  (p)  in  (8.C)  is  due  to  a  suggestion  of  H.  N.  Shapiro. 

§  9.  This  equation  will  l>c  discussed  in  extenso  in  the  following  chapter. 
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The  Optimal  Inventory  Equation 

§  1.  Introduction 

In  this  chapter  we  wish  to  study  a  class  of  analytic  problems  arising 
from  an  interesting  stochastic  allocation  process  occurring  in  the  study  of 
inventory  and  stock  control. 

Although  the  general  equation  seems  to  be  quite  difficult  to  treat,  we 
can  obtain  an  explicit  solution  of  a  particular  case  where  certain  simple, 
but  not  too  far  from  realistic,  assumptions  are  made,  and  we  can  deter¬ 
mine  the  structure  of  the  optimal  policy  in  some  other  cases. 

These  explicit  solutions  are  useful  since  they  lay  bare  certain  meaning¬ 
ful  combinations  of  essential  parameters.  Since  the  inverse  problem  of  the 
estimation  of  parameters  from  observed  data  plays  a  critical  role  in  this 
theory,  this  is  a  feature  which  can  be  of  importance. 

Furthermore,  and  this  is  a  remark  pertinent  to  all  decision  processes, 
the  analytic  form  of  the  solution  will  occasionally  possess  a  simple  eco¬ 
nomic  interpretation,  which  when  verbalized,  opens  the  way  to  the  ap¬ 
proximation  of  optimal  policies  for  more  complicated  processes.1 

Apart  from  the  results  we  obtain,  the  methods  we  employ  to  investigate 
the  structure  of  optimal  policies  possess  an  independent  interest.  The 
reader  has  already  encountered  them,  in  part,  in  §  12  of  Chapter  I,  and 
will  encounter  them  again  in  a  later  chapter  devoted  to  the  calculus  of 
variations.  What  stands  out  quite  vividly  is  the  fact  that  the  method  of 
successive  approximations  is  not  only  useful  in  the  production  of  exist¬ 
ence  and  uniqueness  theorems,  to  which  relatively  dull  task  it  is  usually 
relegated,  but  is,  in  addition,  a  powerful  analytic  tool  for  the  discovery 
and  proof  of  properties  of  the  solution  of  a  functional  equation,  and  in 
our  case,  for  the  determination  of  the  behavior  of  optimal  policies. 

We  shall  begin  with  the  formulation  of  a  class  of  related  problems  oc¬ 
curring  in  the  study  of  “optimal  inventory.”  Following  this,  we  devote  a 
section  to  the  simple  formal  observation  upon  which  all  the  analysis  in 
this  chapter  hinges. 

We  then  consider  a  number  of  cases  in  which  the  optimal  policy  is 

1  This  idea  has,  of  course,  been  u  id  extensively  in  the  physical  and  engineering 
world. 
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characterized  in  an  especially  simple  and  intuitive  way,  namely,  by  the 
maintenance  of  a  constant  "stock  level”.  In  particular,  this  is  the  case,  in 
both  the  multi-dimensional  as  well  as  the  one-dimensional  case,  if  all  the 
ordering  costs  are  directly  proportional  to  the  amounts  ordered. 

If  the  initial  ordering  cost  includes  a  fixed  cost  which  is  independent  of 
the  amount  ordered,  the  problem  seems  to  become  very  much  more 
difficult.  This  fixed  cost  may  represent  a  "red  tape”  cost,  or  a  "set-up” 
cost,  in  the  case  of  manufacturing  processes.  We  shall  not  treat  any  prob¬ 
lems  of  this  type  here,  since  at  the  present  time  practically  no  solutions 
of  the  corresponding  functional  equations  exist,  and  very  little  seems  to 
be  known  concerning  the  character  of  the  optimal  policies  arising  from 
processes  of  this  more  realistic  type. 

To  illustrate  further  the  method  of  successive  approximations,  we  shall 
consider  two  processes,  each  variants  of  the  relatively  simple  process 
discussed  above.  In  the  first,  linearity  is  discarded,  in  that  the  cost  is 
taken  to  be  a  convex  function  of  the  amount  ordered;  in  the  second,  si¬ 
multaneity  is  voided,  in  that  there  is  assumed  to  be  a  time-lag  in  satis¬ 
fying  an  order.  Although  the  optimal  policies  cannot  be  described  in 
simple  terms,  we  can  determine  their  general  structure. 

From  the  mathematical  point  of  view,  we  have  to  deal  with  a  very  in¬ 
teresting  class  of  quasi-linear  integral  equations,  nonlinear  versions  of  the 
renewal  equation  which  we  shall  discuss  in  an  appendix.  As  usual,  these 
nonlinear  equations  possess  certain  quasi-linear  properties  which  we  can 
occasionally  use  as  handholds  and  footholds  in  making  our  way  through 
this  tortuous  terrain. 

§  2.  Formulation  of  the  general  problem 

The  problem  we  shall  discuss  here,  in  various  masquerades,  is  one  very 
particular  case  of  the  general  problem  of  decision-making  in  the  face  of 
an  uncertain  future.  The  version  we  shall  consider  is  concerned  with  the 
problem  of  stocking  a  supply  of  items  to  meet  an  uncertain  demand, 
under  the  assumptions  that  there  are  various  costs  associated  with  over- 
supply  and  undersupply. 

The  situation  may  be  described  as  follows:  At  various  specified  times, 
determined  in  advance  or  dependent  upon  the  process  itself,  we  have  an 
opportunity  to  order  supplies  of  a  certain  set  of  items,  where  the  cost  of 
ordering  depends  naturally  upon  the  number  ordered  of  each  item,  and 
where  there  may  or  may  not  be,  in  addition,  some  fixed  costs,  adminis¬ 
trative  or  otherwise,  which  are  independent  of  the  number  ordered.  At 
various  other  times,  demands  are  made  upon  the  stocks  of  these  items. 
The  interesting  case  is  that  where  these  demands  are  not  known  in  ad¬ 
vance,  but  where  we  do  know  the  joint  distribution  of  the  demands  which 
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can  be  made  at  any  particular  time.  The  incentive  for  ordering  lies  in  a 
penalty  which  is  assessed  whenever  the  demand  for  an  item  exceeds  the 
supply.  Different  penalties  may  he  levied  in  different  fields  of  activity. 
A  case  which  we  shall  treat  in  great  detail  is  that  where  the  penalty  is 
directly  proportional  to  the  excess  of  demand  over  supply.  Its  importance 
lies  in  the  fact  that  we  can  solve  the  functional  equations  arising  from 
the  process  explicitly  under  the  crucial  assumption  that  the  cost  of  initial 
ordering  depends  only  upon  the  amount  ordered,  and  is  either  a  linear 
function,  or,  more  generally,  convex. 

Speaking  loosely,  we  wish  to  determine  the  ordering  policy  at  each 
stage  which  will  minimize  some  average  function  of  the  overall  cost  of  the 
piocess.  In  practical  applications,  an  important  aspect  of  the  problem, 
which  we  shall  not  discuss  here,  is  that  of  determining  suitable  criteria  for 
the  various  costs,  which  are  both  realistic  and  analytically  malleable. 

In  the  following  subsections  we  shall  consider  various  sets  of  assump¬ 
tions  which  yield  various  functional  equations,  all  of  which  belong  to  a 
common  family.  Additional  processes  will  be  discussed  in  the  exercises. 

A.  Finite  total  time  period 

The  first  process  we  shall  consider  involves  the  stocking  of  only  one 
item.  We  shall  assume  that  orders  are  made  at  each  of  a  finite  number  of 
equally-spaced  times,  and  immediately  fulfilled.  After  the  order  has  been 
made  and  filled,  a  demand  is  made.  This  demand  is  satisfied  as  far  as 
possible,  with  excess  demand  leading  to  a  penalty  cost. 

Let  us  assume  that  we  know  completely  the  following  functions: 

(1)  a.  <f(s)<is  probability  that  the  demand  will  lie  between  s  and 
s  -f  (Is.2 

b.  ft  (z)  the  cost  of  ordering  z  items  initially  to  increase  the  stock 
level. 

c.  p  (z)  t  he  cost  of  ordering  z  items  to  meet  an  excess,  z,  of  demand 

over  supply,  the  penally  cost. 

Observe  that  we  assume  that  these  functions  arc  independent  of  time. 
Furthermore,  we  suppose  that  these  orders  can  he  filled  immediately. 

Let  x  denote  the  stock  level  at  the  initiation  of  the  process.  Assuming 
that  there  are  n  stages,  we  will  order  a  quantity  \q  at  the  first  stage,  y2  at 
the  second  stage,  and  so  on. 


*  We  shall  avoid  Stieltjes  integrals  throughout  to  simplify  the  discussion.  It 
wif  readily  l>c  seen  that  most  of  our  results  carry  over  to  the  more  general  situation 
when  suitable  attention  is  paid  to  possible  nonuniqueness  of  roots  of  equations. 
This  is  left  as  a  set  of  exercises,  of  nontrivial  nature,  for  the  reader. 
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A  set  of  functions  (y,,  yt,  y*),  )'*•  =  y*  (at),  specifying  for  each  A 
the  quantity  yt-  to  he  ordered  at  the  A,h  stage  when  the  stock  level  is  a  will 
be  called  a  policy.  Corresponding  to  each  policy,  there  will  be  a  certain 
expected  total  cost  for  this  «-stagc  process,  composed  of  initial  ordering 
and  penalty  costs. 

The  problem  we  set  ourselves  is  that  of  determining  the  policy,  or 
policies,  which  minimize  the  expected  total  cost.  A  policy  which  yields 
this  minimum  exj>ected  cost  is  called  optimal.  All  this  is  in  accordance 
with  our  previous  notation. 

We  obtain  an  equally  interesting  but  more  difficult  class  of  problems  if 
we  attempt  to  minimize  the  probability  that  the  cost  exceeds  a  fixed 
level. 

At  any  stage,  the  problem  is  characterized  completely  by  two  state 
variables,  .v,  the  supply  of  stock,  and  n,  the  number  of  remaining  stages. 
Let  us  then  define 

(2)  fn  (a)  =  expected  total  cost  for  an  H-stage  process  starting  with  an 

initial  supply  x,  and  using  an  optimal  ordering  policy. 

Let  us  now  proceed  to  obtain  a  functional  equation  for /„  (x).  We  have 

(3)  /,  (  v)  -  k  (y  —  x)  +  p  (s  —  y)  <r  (s)  ds , 

if  a  quantity  y  —  x  >  0  is  ordered. 

Although  it  may  seem  odd  to  order  a  quantity  y  —  x,  instead  of  say  y, 
it  turns  out  that  it  is  simpler  to  think  of  ordering  up  to  a  certain  level, y. 
The  optimal  stock  level  turns  out  to  be  a  more  basic  quantity  than  the 
amount  ordered. 

Since  v  is  to  be  chosen  to  minimize  the  expected  cost,  we  see  that  /,  (a) 
is  given  by 

(■»)  /i  (-v)  =  Mi”  >'  (y  x)  +  f  P  (s  ~y)  9  («)  dsl  ■ 

•t  j  *  V 

In  general,  for  n  2  we  have 

/•  oo 

(5)  fn  (a)  =  Min  A  (y  —  a)  -f  I  p  (s  —  y)  'f  (s)  ds  + 

U  r  J  V 

fn  -  1  (<>)  (  If  (s)  ds  +  ^  fn  -  1  (y  —  S)  (f  (S)  ds]  , 

J  U  jo 

upon  enumerating  the  various  c.ises  corresponding  to  the  possibility  of 
an  excess  of  demand  over  supply  ,  and  corresponding  to  the  possibility  of 
being  able  to  fulfill  the  demand. 


loo 
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B.  Unbounded  time  period — discounted  cost 

If  we  wish  to  consider  an  unbounded  period  of  time  over  which  this 
process  operates,  we  must  introduce  some  device  to  prevent  infinite  costs 
from  entering. 

The  most  natural  such  device  is  that  of  discounting  the  future  costs, 
using  a  fixed  discount  ratio,  a,  0  <  a  <  1,  for  each  period.  This  possesses 
a  certain  amount  of  economic  justification  and  a  great  deal  of  mathema¬ 
tical  virtue,  particularly  in  its  invariant  aspect. 

If  we  set 

(6)  /(*)  =  expected  total  discounted  cost  starting  with  an  initial  supply 

x  and  using  an  optimal  policy, 

we  obtain,  by  the  same  enumeration  of  possibilities,  in  place  of  (5)  the 
functional  equation 

(7)  /(*)  =  Min  [k  (y  —  x)  +  a  [  p  (s  —  y)  <p  (s)  ds  -f-  «/( 0)  f  <p  (s)  ds 

y  >  x  J V  JV 

+  a  J  V  (y  —  s)  <f  (s)  us] . 

The  advantage  of  (7)  over  (5)  is  the  usual  one  that  it  contains / ( x ),  one 
function  of  one  variable,  in  place  of  a  sequence  of  functions,  {/„  (x)}. 


C.  Unbounded  time  period — partially  expendable  items 

If  we  assume  that  some  of  the  items  supplied  upon  demand  may  be 
partially  recovered,  so  that  a  demand  of  s  items  results  in  a  return  of  bs 
items,  0  <,  b  <,  1 ,  which  may  be  used  again,  the  analogue  of  (7)  is 

(8)  fix) Min  [k  (v  —  x)  +  a  [  p  (s  —  y)  qj  (s)  ds  -f  a  f  f(bs)<p(s)ds 

j /  y  Jt  Jv  Jv 

+  «  JV( y  —  s  +  bs)  <P  (s)  ^s]  • 

I).  Unbounded  time  period — one  period  lag  in  supply 

Let  us  now  assume  that  when  we  order  a  quantity  z  it  does  not  become 
available  until  one  period  later.  If  the  current  supply  is  x  and  y  was  on 
order  from  the  period  before,  x  +  y  will  be  available  to  meet  the  next 
demand.  The  functional  equation  corresponding  to  (7)  is  now  of  more 
complicated  form 

(9)  /  (x)  =  Min  [kz  +  a  f  p  (s  —  x)  tp  (s)  ds  +  af  (z)  f  <p  (s)  ds 

l  >  0  Jz  JZ 

+  a  [Z  fix  —  s  +  z)<p(s)  ds] . 
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The  quantity  x  now  represents  the  total  quantity  available  at  any  stage 
to  meet  the  demand. 

E.  Unbounded  time  period — two  period  lag 

If  we  have  a  two  period  lag,  we  require  two  state  variables  to  describe 
the  state  of  the  process,  namely, 

(10)  x  =  quantity  of  stock  available  to  meet  next  demand, 
y  =  quantity  to  be  delivered  one  period  hence. 

Hence  we  define 

(11)  f(x,y)  —  expected  total  cost  with  x  and  y  as  above,  using  an 

optimal  policy. 

Then  /  (x,  y)  satisfies  the  equation 

(12)  f(x,  y)  =  Min  [kz  +  a  f  p  (s  —  x)  <p  (s)  ds  +  af(y,z)  [  qp  (s)  ds 

Z  >  0  J  *  Jz 

+  a  J  f(x  —  s  +  y,z)<p(s)  rfs] . 

We  shall  not  consider  the  equations  in  (8),  (9),  or  (12)  here,  although 
they  are  amenable  to  the  same  techniques  of  successive  approximation 
we  shall  apply  to  the  others.  There  does  not  seem  to  exist  any  explicit 
solution  comparable  in  simplicity  to  that  obtainable  for  (7). 

§  3.  A  simple  observation 

In  this  section,  we  wish  to  present,  in  as  simple  a  form  as  possible,  the 
fundamental  analytic  property  of  functional  equations  of  the  form 

(1)  u  (x)  =  Min  v  (x,  y),  y  e  R  (x) , 

ft 

upon  which  all  the  subsequent  work  in  this  chapter  depends.3 

In  general,  the  variation  will  be  over  some  region,  R  (x),  in  this  case,  a 
set  of  intervals,  dependent  upon  x.  Let  us  assume  that  over  some  interval 
of  x- values,  a  <  x  <,  b,  the  minimum  is  attained  inside  the  region  R  (x), 
and  that  v  is  differentiable.  Then  at  the  minimizing  value  of  y  we  have 

(2)  0  =  Vy  . 

This  determines  a  function  y  (x),  which  need  not  be  single-valued  but 
which  we  do  assume  differentiable. 

3  This  property  has  already  been  used,  without  explicit  remark  in  §  11  of 
Chapter  I 
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On  any  one  particular  branch  of  this  function  y  (a),  we:  have 

(3)  u  (a)  =  t-  (a.  y) . 

The  crucial  observation  is  now  that  for  a  <i  x  <,  b,  we  have 

(4)  n'  (a)  —  v,  -f-  dv/dx  vx , 

since  rv  =  0,  by  (2). 

Similarly,  if 

(5)  H  (a,,  a,)  =  Min  [r  (.v„  .vf,  y„  yt)].  (y„  >',)  e  R  (a,,  a,)  , 

.v< 

and  we  assume  that  the  minimum  is  always  attained  inside  the  region, 
we  have 

(«)  Ur,  =  Vx,  . 

Ux,  =  Vx,  . 

at  (lie  minimizing  points. 

I.ct  us  now  apply  these  remarks  to  the  functional  equation  ot  (2.7), 
under  the  assumption  that  /.•  (z)  kz,  A*  >  0  and  p  (z)  pz,  linear  func¬ 
tions  of  z.  We  have 

(7)  /(a)  Min  iiy  kx  -J-  a  |  />(■•>  y)  7  (s)  ds  4-  af  (0)  f  ip  (s)  ds 

«  f  /  (y  s)  <!  (s)  ds) . 

J  u 

If  the  minimum  is  attained  at  a  point  v  >  ,v.  we  have  at  this  point 

(•N  k  up  I  </  (s)  d\  11  |  /'  (v  -  \)  if  (s)  ds  0, 

an  equation  independent  of  \! 

I'urt hermore,  for  this  value  of  v,  we  have 

(!»)  /'(A)  k. 

These  two  results,  correct lv  combined  and  interpreted,  furnish  the  clues 
to  the  solutions  of  the  problems  involving  proportional  costs.  We  shall 
discuss  them  m  more  detail  tn  later  sect  ions,  and  we  shall  also  utilize  their 
mult  i -dimensional  anali igues. 

$  l  Constant  stock  level — preliminary  discussion 

In  this  and  the  next  lew  sections  we  shall  consider  several  processes 
characterized  by  t  lie  pi  mciplc  of  "constant  stock  level.”  The  common  fea¬ 
ture  of  these  mode!  is  tin  assumption  that  the  cost  of  initial  ordering  is 
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directly  proportional  to  the  amount  ordered,  and  that  the  distribution 
of  demand  remains  the  same  from  stage  to  stage.  The  addition  of  an 
administrative  fixed  cost,  “red-tape”  cost,  changes  the  nature  of  the 
optimal  policy  in  an  essential  manner.4  This  cost  may  also  represent  "set¬ 
up”  cost  in  manufacturing  processes. 

In  §  5,  we  shall  obtain  the  complete  solution,  for  an  arbitrary  distribu¬ 
tion  function  <p  (s),  for  the  case  where  the  penalty  cost  is  also  directly 
proportional  to  the  number  ordered.  In  §  6  we  extend  this  result  to  the 
multi-dimensional  case,  and  show  that  the  solution  for  the  case  where 
there  are  many  items  subject,  to  a  joint  distribution  of  demand  possesses 
the  very  important  property  of  sub-optimality. 

Turning  from  the  consideration  of  these  processes  involving  unbounded 
time  intervals,  we  consider  the  finite  process  described  in  §  2  and  show 
that  again  the  assumption  of  direct  proportionality  entails  a  principle  of 
constant  stock  level  at  each  stage.  This  level,  of  course,  changes  with  the 
stage. 

This  section  serves  as  an  excellent  introduction  to  the  use  of  successive 
approximations  as  an  analytic  tool  in  the  study  of  these  functional  equa¬ 
tions. 

We  enter  territory  where  the  going  is  much  rougher  when  we  con¬ 
sider  the  case  where  the  penalty  cost  includes  a  “red-tape”  term  which  is 
independent  of  the  amount  ordered.  T  he  form  of  the  solution  now  seems 
in  the  general  case  to  depend  upon  the  form  of  the  demand  function. 
Nevertheless,  several  important  classes  of  distribution  functions  fall 
within  catagories  which  we  can  handle  precisely. 

Finally,  we  indicate  briefly  the  form  of  the  general  solution  without, 
however,  being  able  to  make  any  constructive  use  of  it. 

§  5.  Proportional  cost — one-dimensional  case 

In  this  section  we  present  the  solution  of  the  case  where  both  cost 
functions,  direct  ordering  and  penalty  ordering,  are  directly  proportional 
to  the  amounts  ordered. 

THEOREM  1.  Consider  the  equation 

v)  -f  a  I  p  (s  —  v)  (f  ( s )  ds  -f-  af  (0)  I  qp  (s)  ds 

Ju 

4  a  I  ^/(v  s)  q  (s)  ds] . 
n  her we  impose  the  conditions 

*  In  the  sense  that  it  changes  the  policy  from  one  of  known  form  to  one  of 
unknown  form. 


(1)  f(x)  =  Mm  /,'  (y  - 

ft  *  JT 
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(2)  a.  k  and  p  are  positive  constants, 

r©°  c°° 

b.  tp  (s)  >  0,  J  9?  (s)  ds  =  1,  I  s  <p{s)  ds  <oo, 

c.  0  <  a  <  1 , 


d.  ap  >  k. 

Let  x  be  the  unique  root  of 

(3)  k  =  ap  J  <p  (s)  ds  +  ak  j*  <p  (s)  ds  *. 

Then  the  optimal  policy  has  the  form 

(4)  a.  for  0  <,  x  ■<,  x,  y  =  x , 

b.  for  x  ;>  x,  y  =  x. 

In  other  words,  the  optimal  stock  level  is  x. 

If  ap  <,  k,  the  solution  is  given  by  y  =  x  for  x  ;>  0,  i.e.  never  order. 

Proof.  In  order  to  understand  the  genesis  of  this  solution,  let  us  proceed 
heuristically.  If  we  can  obtain  a  plausible  solution  by  some  formal  means 
and  then  verify  directly  that  it  satisfies  the  equation  in  (1)  above,  the 
uniqueness  theorem  established  in  §  9  of  Chapter  IV  tells  us  that  it  is  the 
solution.  Let  us  point  out,  however,  that  the  method  of  successive  approx¬ 
imations  would  have  led  us  to  this  solution  in  a  systematic  fashion. 

As  pointed  out  in  §  3,  if  the  minimum  occurs  at  y  >  x,  the  minimizing 
values  of  y  must  be  roots  of  the  equation 

(5)  k  +  a  [—  p  j*  <p  (s)  ds  +  J  V  (y  —  s)  <f  (s)  </s]  =  0 , 

and  at  this  value  of  y  we  have 

(6)  /»=_*. 

Now  let  us  pull  ourselves  up  by  our  bootstraps.  If  the  solution  has  the 
conjectured  form,  the  complicated  term,  J  /'  (y  —  s)  <p  (s)  ds  may  be 

replaced  by  the  simpler  term  —  A  J  <p  (s)  ds,  so  that  equation  (5)  may 
be  replaced  by 

4  The  interpretation  of  this  equation  is  that  the  run-out  probability  must  be 
set  at  the  level  where  the  marginal  cost  for  holding  inventory  is  just  balanced 
by  the  marginal  penalty  for  run-out. 
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(7)  k  —  ap  j  <p  (s)  ds —  ak  J*  <p  (s)  ds  =  0, 

precisely  the  equation  of  (3). 

/•oo 

Since  J  <p  (s)  ds  =  1,  this  equation  reduces  to 

<8)  j*  <p  (s)  ds  =  (ap  —  k)/a  (p  —  k) , 

which  possesses  exactly  one  root  under  the  assumption  that  qp  (s)  >  0. 
Observe  that  the  limiting  cases  behave  properly.  U  ap  —  k  —  0,  y  =  0, 
if  a  =  1,  y  =  oo;  if  p  —  oo,  y  =  oo. 

Having  determined  x,  we  proceed  to  determine  /  (x)  as  follows.  For 
O^x^rwe  have 

(9)  f(x)  =  k(x  —  x)  +  a[  j-  p(s  —  'x)<p(s)ds  +/(0)  J-  <p  (s)  ds 

+  j*  f(x  —  s)  qp(s)  rfs], 


and  /'  (x)  —  —  k,  or, 

(10) 


f(x)  =/(0)-Arx. 


Substituting  (10)  in  (9),  and  setting  x  =  0,  we  obtain  the  following  result 

for  /(0)*, 

kx  +  pa  |  (s  — -  x)  (p  ( s )  ds  —  ak  l*(x  —  s)  <p  (s)  ds 

d)  m - ^ - <r^j — - 

To  determine  f(x)  for  x  ^  x1  we  have  the  equation 

(12)  f(x)=a[j*  p(s  —  x)qp(s)ds+f(0)j^  qp  (s)  ds  + 

jo  /  (x  —  s)  (p  (s)  ds] 
which  we  write  in  the  form 

(13)  f(x)  =  u(x)  +  aj  f(x  —  r)qp(s)  ds, 

•  Note  that  the  x  we  obtain  from  (7)  is  the  value  of  x  which  minimizes  this 
expression  for  /  (0). 

’  Observe  that  as  far  as  applications  are  concerned,  this  part  of  the  solution 
is  of  very  little  interest,  since  for  only  one  initial  interval,  if  at  all,  will  x  ever 
exceed  x 
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whore  u  ( x )  is  a  known  function  of  x.  This,  in  turn,  we  write 

(14)  /(a)  =  «  (x)  -f  a  f  / (*  —  *)  (f  (s)  ds  -f-  a  f '  f  (x  —  s)  <p  (s)  ds . 

J  O  J/-X 

In  the  interval  [x  —  x,  x],  f  {x  —  s)  is  known,  hence  we  may  write,  com¬ 
bining  the  n  (a)  term  and  the  second  integral 

(15)  f(x)  =  v  (x)  -f  a  J  f  (x  —  s)  <f  (s)  ds ,  x  >  x. 

If  we  now  set  x  —  x  —  z  and  f  (x  -f  z)  —  g  ( z ),  we  see  that  g  (z)  satisfies 
the  equation 

(16)  g  (z)  =  v  (x  +  z)  +  a  g{z  —  s)<p  (s)  ds ,  z  ^  0, 

a  simple  renewal  equation  whose  properties  are  discussed  in  the  appendix. 

Actually,  it  is  much  simpler  to  differentiate  (12)  first  and  then  proceed 
as  above.  Let  us  observe,  parenthetically,  that  it  seems  to  be  a  general 
characteristic  of  functional  equations  in  the  theory  of  dynamic  program¬ 
ming  that  the  derivatives  satisfy  simpler  equations,  and  are  the  more 
basic  quantities.  This  is  due  to  the  fact  that  they  represent  "marginal 
returns”,  or  "prices”,  which  in  purely  mathematical  language  means 
that  they  represent  Lagrange  multipliers.  This,  in  turn,  is  connected  with 
the  general  problem  of  constructing  dual  processes,  a  subject  we  shall  not 
pursue  here. 

Let  us  now  turn  to  a  proof  that  the  conjectured  solution  is  actually 
a  solution.  Call  the  function  obtained  above  F  (x)  and  denote  the  constant 
/  (0)  determined  in  (11)  by  C.  Then  F  (,v)  is  completely  determined  by 
the  following  equations. 

(17)  a.  F  (x)  —  C  —  kx,  0  <  x  <  x 

b.  F  (a)  =  a  [  J  p  (s  —  a)  (p  (s)  ds  - ,  F  (0)  j  y  (s)  ds 

+  f  F  (a  —  s)  <p  (s)  ds],  x  >  a , 

.  O 

An  essential  point  in  our  verification  of  the  solution  is  the  fact  that 
F  (a)  -f-  kx  is  strictly  increasing  for  \  0.  This  we  establish  as  follows.  ^ 

From  (17b),  we  see  that 

(18)  F’  (a)  ap  f  y  (s)  ds  a  f  F’  (v  s)  rp  (s)  ds, 

J  x  Jo 
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for  x  >  x.  In  [a  —  x,  at],  wo  have  0  <;  a  —  s  <,  x  and  lienee  F'  (x  —  s)  = 
— k,  as  we  see  from  (17a).  Thus  for  x  >  x. 

(19)  (x)  =  -ap  f  qp(s)ds — ka  f  ^(s)rfs  +  «  f  F' (x  —  s)tp(s)ds, 

Jx  Jz  A  Jo 

or 

(20)  F'  (x)  -f  k  —  [A-  —  ap  J  <p  (s)  ds  —  ak  J  <p  (s)  ds] 

+  «  J  IF'  (x  —  s)  +  />]  <f  (s)  ds. 

The  expression 

(21)  u  (x)  —  k  —  ap  J  ip  (s)  ds  —  ak  s:  (f  (s)  ds 

is  zero  at  x  =  x  and  positive  thereafter.  Setting  x  —  x  —  z  and  F'  (x  z) 
4-  ft  =  g  {z),  we  see  that  g  (z)  satisfies  the  ei}uation 

(22)  g  (z)  =  u  (x  +  z)  +  £  g  (z~s)<p  ( s )  ds,  z  ^  0 . 

It  follows,  cf.  §  18,  that  g  (z)  >0  for  z  >  0. 

Hence,  F'  (x)  -f-  k  >  0  for  x  >  x,  and  F  (x)  -f-  kx  is  strictly  increasing 
for  .t  >  .v. 

Let  us  now  return  to  the  problem  of  demonstrating  that  F  (x)  satisfies 
the  equation  in  (1).  Consider  first  the  case  where  x  >  x.  Then 

(23)  /•  (-v)  =  Min  [  ...  ] 

H  >  X 

Mm  k  (y  —  .v)  +  F  (y)  ]  , 

v  >  x 

using  the  representation  in  (17b).  Since  ky  -f-  F  (v)  ^  kx  -f  F  (x)  for 
V  >  x,  we  see  that  the  minimum  occurs  at  y  —  v,  yielding  F  (x),  as 
desired. 

Now  consider  the  interval  0  v  <  v.  Write 


Min  =  Min 


Min  [ 
v  * 

Mm 

x  •  II  >  I 


As  above,  the  minimum  over  y  >  x  reduces  to  the  value  at  y  =  x. 
Hence 

(25)  Min  .  .  .  ]  Min  [...  -] 

y  s  jl  >  n  ^  t 

Since  F  (x)  =C-  Aw  for  0  <  .v  <  ,r.  it  follows  that  the  minimum  is 
assumed  at  v  =  x,  as  in  the  original  derivation  of  the  value  of  x. 
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In  the  case  ap  ^  k,  taking  x  —  0  in  (17)  yields  an  F  which  is  easily 
seen  to  satisfy  (23),  since,  as  above,  F  (y)  -f-  ky  is  non -decreasing. 

This  completes  the  proof.  It  is  interesting  to  note  that  the  solution  for 
0  <;  x  <,  x,  the  most  important  part  of  the  solution,  can  be  found  without 
reference  to  the  form  of  the  solution  lor  x  >  x. 

This  completes  the  verification  of  the  fact  that  F  (x)  is  a  solution,  and 
consequently  the  solution,  within  the  class  of  uniformly  bounded  func¬ 
tions  over  x  ;>  0. 

§  6.  Proportional  cost — multi-dimensional  case 

Let  us  now  consider  the  multi-dimensional  version  of  the  problem. 
Here  we  have  N  items  whose  stock  levels  will  be  denoted  by  xu  xt,  ...» 
Xn,  and  whose  demand  (sl  st,  ....  sH)  at  any  time  is  subject  to  a  joint 
distribution  function  whose  density  is  <p  (s„  st,  ....  sn). 

In  formulating  the  functional  equation  for  the  function  /  (x„  xt,  ....  xH), 
the  minimum  expected  over-all  discounted  cost,  let  us,  for  the  sake  of 
simplicity;  consider  only  the  two-dimensional  case. 

The  remarkable  fact  that  emerges  is  that  the  form  of  the  solution  is 
precisely  the  same  as  if  <p  (s,,  st,  . . .,  s„)  had  the  form  q>l  (s,)  (pt  (s,)  . . . 
q>n  (s»),  i.e.  uncorrelated  demands.  It  is  this  which  yields  the  important 
sub-optimalization  property  ot  the  solution  which  we  discuss  below.  An 
enumeration  of  cases  yields  the  following  functional  equation  for / (xt,  xt) : 

(1)  =  Min  [A.Cy,  — x,) -j- — x.) -fa[  f  f  [^(s,— -y.) 

vt  >  Tt  Jv,  J  V, 

+  pt  ( s*  —  y»)]  (p  (s„  Si)  dst  ds% 
+  /  (0,  0)  f  f  <p  (s„  s,)  dsx  dst 

Jvi  Jv  I 

+  f  [pi  (S1  —  Vi)  +  /  (o,  y,  —  s,)]  <p  (s,,  st)  dst  dst 
Jv,  J  0 

+  \  ‘  I  [f(yi  —  Si.  0)  -f  pt  (s,  —  y,)]  <p{slt  s,)  dst  dst 

Jo  Jv, 

+  J1"'  JoV‘/(yi  —  s„  y,  —  s,)  <p  (s„  s.)  rfs,rfs,]] 

Let  us  simplify  our  notation  a  bit  by  setting  <p  fs, ,  s,)  dsx  dst  = 
dG{slt  s,)  and  call  the  quantity  within  the  brackets  K( y1(yt).  We  then  have 

Q  ^  roo  roo 

(2)  —  =  *,  +  «[  —  />,  (  dG  (s„  s.)  ) 

c  y1  Jv ,  J«,-o 
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* 


+ 

r- «/ , 

J" 

t.O) 

<J 

f°°  dG  (s„ 

*.-v. 

St)  ) 

+ 

['•  8/  , 

J.  Jo  ey;(y‘ 

—  St, 

y»- 

—  st)  dG  (s„ 

*.)]. 

VK 

roo 

roo 

8yt 

kt  +  a  [  —  pt 

U. 

L 

#  dG  (s„  s.) 

df  , 

i 

•oo 

+ 

Jo 

-s») 

(J 

,  dG  (*«• 
*1  “»» 

St)) 

+ 

Jo  J. 

—  Si. 

y*- 

—  st)  dG  (s„ 

s,)L 

Furthermore,  as  above,  if  y,  >  y,  >  x„  we  have 


(3) 


«/  _  ,  £/_  , 

- k"dxt~~kt 


Consequently,  if  we  assume  that  the  solution  here  has  the  same  form 
as  in  the  one-dimensional  case,  the  critical  levels  x,  and  xt  are  given  as 
roots  of  the  equations 

(4)  a.  *,  +  «[  —  /!,  [  (f  dG  (s„  s,))  —  kt  fT‘  (  f  rfG(s„s1))]  =  0 

JXx  J9%~  0  JO  0 

b.  Ar,  +  <*[  —  pt  [  (f  dG  (s„  s,))  —  kt  (  f  rfG(s„s,))]  =  0 

Jr,  J«, -  o  Jo  J»,  -  o 

These  roots  exist  and  are  unique  provided  we  make  the  same  assump¬ 
tions  as  above,  namely,  apy  >  ky,  apt  >  kt,  and  dG  >  0. 

We  see  that  x,  depends  for  its  determination  only  upon  the  condi- 

r  uo 

tional  distribution  dG  (s,,  st),  and  similarly  to  determine  x,  we  require 

rco 

only  \  -  dG  (s„  s,). 

J»,  -  0 

This  is  the  important  property  of  suboptimalization  mentioned  above. 
The  verification  of  the  solution  follows  precisely  the  same  lines  as  that 
for  the  one-dimensional  case,  and  hence  will  be  omitted,  since  the  details 
are,  of  course,  much  more  tedious. 

Let  us  state  our  conclusion  as 


Theorem  2.  Let  us  impose  the  following  conditions  upon  the  equation  in  (1) : 
(5)  a.  ki  and  pt  are  positive  constants, 

roo  roo  roo  r  oo 

b.  <f  >  0,  I  __  q>  dsy  ds,  =  1,1  I  stqp  dsy  dst  <  oo 
J  o  J  o  J  o  J  o 


c.  0  <  a  <  1  , 

d.  apt  >  ht , 
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Let  Xi  be  the  unique  root  of 

|*oo  |»oo  /•  y  /•» 

(f>)  ki  =  apt  (  fjr  (sl.st)<ist)<isl — tiki  (  q>  (s„  st)  dst)  ds 

J  V  J  *i  o  Jo  a 

Then  the  optimal  policy  has  the  form 

(7)  a.  for  0  <;  xt  <;  x,,  y,  =  xt 

b.  for  xi  ;>  xt,  \t  —  xt 

In  other  words,  the  optimal  stock  level  for  the  i"‘  item  is  xi. 

If  api  <,  kt  for  any  i,  we  set  xi  =  0  (i  =  1,  2). 

It  is  clear  that  this  form  of  the  solution  extends  immediately  to  the 
N -dimensional  case. 

ij  7.  Finite  time  period 

Let  us  now  consider  the  corresponding  problem  for  a  finite  process 
where  we  do  not  discount  future  costs.  We  now  wish  to  minimize  the 
total  expected  cost. 

We  define 

(1)  f.\  (x)  =  expected  cost  over  an  A’-stage  period  starting  with  an 

initial  quantity  x  and  using  an  optimal  A’-stage  policy. 

Then 

(2)  /,  (.v)  Min  k(y  x)  --  p  I  (s  y)  q(s)  ds] 

tt  j-  J  y 

/»  + 1  (a-)  Min  k(y  x)  -f  />  (  (s  v)  q  (s)  ds  +fn  (0)  I  q(s)ds 
y  j  Jy  J  y 

+  |  '  f»  (y  —  ••>')  7  {•')  ds],  n  =  1,2,  ... 

We  wish  to  prove,  under  the  natural  assumption  />  >  k, 

Thkokkm  3.  For  each  n,  the  optimal  policy  has  the  form 

(3)  a.  for  a  v„,  y  .v„ , 

1).  for  x  v„,  y  v 

where  the  sequence  ,vn  is  monotone  increasing  in  n. 

Proof.  The  proof  will  he  inductive.  We  have,  with  /,  (.v)  defined  as  in 
(2),  as  our  critical  stock  level  the  solution  of 

( I)  k  p  I  7  {.%■)  ds , 

J  V 

I  r><» 
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which,  if  it  exists,  is  unique,  and  which  does  exist  if  p  >  k,  as  is  reason¬ 
able.  Call  this  value  x,.  It  is  clear  then  that  for  n  =  1,  the  optimal  policy 
is  y  —  x,  for  x  <;  x, ;  y  =  x  for  x  ^  x,.  When  y  —  x,  we  have  /',  ( x )  — 
—  k,  and  lor  x  ;>  x,,  we  have 

(5)  /,  (x)  =  p  £  (s  —  x)  <p  (s)  ds . 

A  (*)  =  —£  £  (s)  —  k, 

f“ !  (*)  =  P  9  (x)  >  0. 

Hence /',  (x)  +  £  ;>  0  for  all  x  ;>  0. 

Consider  the  case  n  =  2.  We  have 

(6)  /,  (a)  =  Min  [A  (y  —  *)  -f  p  i  (s  —  y)  (s)  <fs  +  /,  (0)  f  9?  (s)  ds 

y  >  1  JV  JV 

+  JV|  (y  —  s)  9?  (s)  </s] . 

The  critical  value  of  y  is  attained  by  setting  the  partial  derivative  with 
respect  to  y  equal  to  zero,  or 

(7)  k  =  p  9?  (s)  rfs  —  J  V,' (y  —  s)  9?  (s)  rfs  =  Fi  (y) . 

The  function  F,  (y)  has  the  derivative 

(8)  F',  (y)  =  —  p<p  (y)  —  /i'  (0)  9?  (y)  —  J"  /,'  (y  —  s)  95  (s)  rfs . 

Since  /,'  >  0,  p  -f-  /,'  (0)  >  A  -f  /,'  (0)  =  0,  we  see  that  F,  (y)  is  mono¬ 
tone  decreasing,  and  there  can  be  at  most  one  root  of  (7).  However, 
F,  (0)  =  p  >  k,  F,  (00)  =  0.  Hence  there  is  precisely  one  root.  Call  this 
root  x,. 

The  policy  is  then 

(9)  y  —  xt,  0  x  <;  x, , 

y  =  x,  x,  <,  x . 

The  geometric  picture  is  illuminating.  Write  (6)  in  the  form 

(10)  /,  (v)  +  kx  =  Min  v  (y) , 

y  r 

where  v  (y)  is  a  known  function.  From  what  we  have  demonstrated  above, 
v  (y)  has  the  following  graph 
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0 

Figure  1 

The  function  /,  ( x )  -f  kx  is  obtained  by  drawing  the  tangent  to  v  (y) 
at  y  =  xt  and  continuing  it  to  the  left  until  it  hits  the  r-axis.  The  func¬ 
tion  /,  ( x )  -f-  kx  is  now  constant  for  0  <;  x  <;  xt  and  equal  to  v  ( x )  for 
x  ~^xt. 

f2(  x )+  kx 


OX, 

Figure  2 

It  remains  to  show  that  *,  >  xt.  The  quantity  xt  is  determined  by 
equation  (4),  while  xz  is  determined  bv  (7).  Since  —  /,'  ;>  0,  it  follows 
that  the  curve 

(11)  w  =  g,  (y)  =  p  J  <p  ( s )  ds  —  J  V,'  (y  —  s)  <p  (s)  ds 

always  lies  above  the  curve 

(12)  w  =  g,  (y)  =  p  J*  <p  (s)  ds, 


0  *>  w 

FH»ure  3 


168 


THE  OPTIMAL  INVENTORY  EQUATION 
From  this  it  is  clear  the  x,  >  x,. 

In  order  to  continue  this  proof  inductively,  we  must  show  that 

(is)  — (*);>-/,'  (x). 

We  have 

(14)  — /,'  (x)  =  A,  o 

— ft  (*)  =  P  jx  <P  (*)  ds,  x^xt 

and 

(15)  — [x)  =  k,  0  x  <^xt 

=  P  £  <P  00  ds  —  /,'  (x  s)  <p  (s)  ds,  x^xt. 

In  the  intervals  [0,  x,]  and  [xt,  oo],  the  inequality  is  clear.  In  [x„  x,], 

/*oo 

the  inequality  follows  from  the  monotonicity  of  k — p  j  <p  (s)  ds,  which 
is  zero  at  x  =  xx. 

Finally,  we  wish  to  demonstrate  the  convexity  of  /,  (x).  This  is  clearly 
true  in  [0,  xf].  In  [x„  oo],  we  have,  using  (15) 

(16)  /,'  (x)  =  p  <f>  (x)  +  /,'  (0)  <p  (x)  +  J*  /,'  (*  —  s)  9?  (s)  rfs . 

Since  /,'  (0)  +  />  >  0,  //  ^  0,  we  have  /,'  (x)  >  0. 

We  now  have  all  the  ingredients  of  an  inductive  proof. 

§  8.  Finite  time — multi-dimensional  case 

The  hardy  reader  may  verify  that  the  solution  in  the  multidimensional 
case  has  precisely  the  same  general  character. 

§  U.  Non- proportional  penalty  cost — red  tape 

As  soon  as  we  consider  the  case  where  the  penalty  is  not  directly 
proportional  to  the  excess  of  demand  over  supply,  we  encounter  difficul¬ 
ties,  and  it  appears  that  the  simple  and  elegant  solution  obtained  for  the 
case  of  proportional  cost  is  no  longer  valid  generally. 

There  are,  however,  a  number  of  interesting  cases  in  which  we  still 
obtain  a  solution  involving  constant  stock  level.  The  most  interesting  of 
these  occur  when  we  take  the  cost  of  ordering  (s  —  y)  to  be  p  (s  * —  y)  -\~  q, 
where  q  is  a  fixed  administrative  cost  which  appears  whenever  an  excess 
demand  occurs,  regardless  of  the  amount  of  the  demand.  The  initial 
ordering  cost  is  still  assumed  to  be  proportional. 
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Let  us  then  consider  the  equation 

(1)  /(*)  =  Min[*  O'  —  x)  +a[  f°°[p  (s— y) +  ?]9>(s)  ds+/(0)  f  tp(s)ds 

y > m  Jy  Jy 

+  JV(y  —  s)  <p  (s)  rfs]] , 

distinguished  from  the  equation  we  have  considered  above  only  by  the 

I’M 

additional  term  aq  J  <p  (s)  ds.  It  is  surprising  how  much  complication  this 

innocuous-appearing  expression  would  seem  to  introduce. 

We  shall,  to  begin  with,  proceed  formally  on  the  assumption  that  there 
is  a  constant  stock  level  solution.  The  critical  level  is  then  determined  by 
the  solution  of 

(2)  0  =  A  +  a  [ —  p  <p  (s)  ds  —  q  <p  (y)  +  j"  /’  (y  —  s)  <p  (s)  ds] . 

and  we  have  /'  (x)  —  —  k  when  y  >  x. 

It  follows  then  that  x  will  be  a  root  of— 

(3)  0  =  A-f  a[  —  p\  <p  (s)  ds  —  q  <p(y)  —  k  f  *  ip  (s)  ds] 

Jv  Jo 

Unfortunately,  it  is  not  true  that  this  equation  has  a  unique  root  for  all 
density  functions  q>  (s).  This  equation  may  be  written  in  the  form 

(4)  (1  —  a)  k  —  a  (p  —  k)  <p  (s)  ds  +  aq  <p  (y) . 

A  simple  condition  under  which  this  equation  has  a  unique  root  is<p'  (y)  0. 

If  we  do  assume  that  this  equation  has  a  unique  root,  the  proof  is  almost 
exactly  as  before.  There  is,  however,  a  more  general  result  where  the 
optimal  policy  is  that  of  constant  stock  level,  which  we  shall  now  discuss. 

If  we  equation  above,  (3)  or  (4),  does  not  possess  a  unique  root,  it  may 
still  happen  that  the  largest  root  of  (4)  corresponds  to  an  absolute  mini¬ 
mum  of  the  function  in  the  brackets  in  (1),  over  the  interval  [0,  *]. 

Thus  the  picture  may  be 
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Let  us  prove 

Theorem  4.  Under  the  assumptions  upon  a,  k,p,q  and  <p(s),  stated  in 
Theorem  1,  and  the  additional  assumption  that  the  last  minimum  of 

(6)  y>  (y)  =  ky  +  a  [  J  [p  (s  —  y)  +  q]  <p  (s)  ds  —  k  j"  (y  —  s)  <p  (s)  ds 

is  the  absolute  minimum  in  0  <;  y  <,  oo,  the  optimal  policy  in  (1)  is  given 
by  the  rule 

(7)  (a)  y  =  x.forO^ 

y  =  x,  for x  ^  x, 

where  x  is  the  value  of  y  where  the  absolute  minimum  is  attained. 

Proof.  Let  x  be  the  value  of  y  which  yields  the  last  minimum,  and  the 
absolute  minimum  in  the  interval  [0,  oo],  of  the  function  q>  (y)  above. 
Then,  precisely,  as  in  the  case  where  q  =  0,  we  have  f  (x)  —  f  (0)  —  kx 
in  0  x  <.  x,  and  /( 0)  is  determined  by  substituting  this  result  in  (1),  in 
the  range  0  <;  x  <;  x.  In  the  interval  [x,  oo],/ (*)  is  determined  by  setting 
y  —  x  in  (1). 

The  proof  that  f  (x)  actually  satisfies  the  equation  now  continues  in 
exactly  the  same  way  as  in  the  case  where  q  =  0. 

§  10.  Particular  cases 

Some  particular  cases  where  the  above  conditions  are  satisfied  are 
(1)  (a)  tp(x)  =  e-<*-°>7  J°°  c-“’  du 

(L)  9?  ( x )  =  be~bx 

We  leave  the  verification  as  exercises  for  the  reader. 

§  11.  The  form  of  the  general  solution 

Let / (x)  be  the  solution  of  (9.1),  which  is  to  say 

(1)  f(x)  -f  kx  =  Min  F  (y) , 

v  >  * 

where 

(2)  F  (y)  =  ky  +  a  [  p  (s  —  x)  tp  (s)  ds  +  (/( 0)  +  q)  £  q p  (s)  ds 

4-  JJ  /  (y  —  s)  <P  (s)  ds] 
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Let  F  (y )  have  the  graph 

(3) 


Then,  the  optimal  policy  has  the  following  form 

(4)  (a)  y  =  xlt  0  <.  x  <:  xt 

(b)  y  =  x.x^x^.  xt 

(c)  y  =  xt  <  x  <  xit 

(d)  y  =  x,  x  ^  xt, 

and 

(5)  f(x)  +  kx  =  F(x :1).0-^x^xl 

=  F  (x),  Xl<^  x<£  xt 
=  F  (xa),  xt<:x<.xa 
=  F  (%),  xa  ^  x  <  oo 

However,  the  problem  of  determining  how  many  different  regions  exist, 
given  the  cost  functions  and  the  demand  functions,  and  how  to  fit  this 
information  together,  seems  quite  difficult,  and  is  unsolved  at  the  present 
time. 

§  12.  Fixed  costs 

Let  us  now  consider  the  case  where  there  is  a  constant  red-tape  cost  in 
initial  ordering.  This  problem  is  also  unsolved  to  date. 

The  equation  now  has  the  form 

(!)  /(%)  =  Min  [k\y  —  x)  +  g  (y  —  x)  +  a  [  f  p  (s  —  y)  <p  (s)  ds 

y  *>  x  JV 

+  /(°)  J  <P  (s)  ds  +  JJ  /  (y  —  s)  <p  (s)  rfs]] , 
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where 

(2)  g  (*)  =  g,  x  >  0 

=  0.  %  =  0. 

Here  g  represents  the  fixed  cost. 

It  is  tempting  to  envisage  a  solution  of  the  following  fonn 

(3)  y  =  S  for  0  <;  x  ^  s 

=  x  for  s  <  x 

where  0<s<S<oo.  A  policy  of  this  type  is  called  an  "sS-policy.” 

Policies  of  this  type  are  used  in  various  establishments,  and  have  a  fine 
intuitive  flavor.  Unfortunately,  it  is  easy  to  construct  relatively  simple 
examples  which  show  that  this  policy  cannot  be  optimal  in  all  cases,  and 
there  the  matter  rests. 

§  13.  Preliminaries  to  a  discussion  of  more  complicated 
policies 

In  the  previous  sections  we  have  considered  some  processes  having 
solutions  of  quite  simple  and  intuitive  form.  We  now  wish  to  consider  two 
cases  in  which  the  solutions  are  of  a  more  complicated  nature.  The  first  of 
these  will  be  one  involving  a  time-lag  in  the  fulfilling  of  orders,  the  second 
will  treat  the  case  where  the  initial  ordering  function  is  a  non-linear 
convex  function  of  the  amount  ordered,  with  no  red-tape  cost  in  either 
case. 

In  both  cases  we  shall  employ  the  method  of  successive  approximations 
to  determine  the  properties  of  the  solution. 

§  14.  Unbounded  process— one  period  time  lag 

The  functional  equation  we  shall  consider  is  that  derived  in  §  2,  namely 

(!)  /(*)  =  Min  [kz  -f-  a  [  f  p  (s  —  x)  <p  (s)  ds  +/(z)  f  <p  (s)  ds 

i  >  O  Jx  J  x 

+  JV(*  —  s  +  2)9?(s)rfs]  ] 

We  shall  prove 

Theorem  5.  The  optimal  policy  is  given  by  the  rule 
(2)  z  ~  z  (.v)  for  0  <;  x  x 

z  —  0,  for  x<,x, 
where  z  (x)  ^0  and  z  (x)  =  0. 
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This  function  z  (x)  is  monotone  decreasing  in  x. 

Proof.  The  proof  will  proceed  by  induction,  based  upon  the  following 
sequence  of  successive  approximations 

(3)  /.(*)  =  «[  P(s—x)tp(s)ds+ft(0)j*  <p(s)ds  +  j*ft{x—s)<p(s)ds] , 

(a  function  we  have  repeatedly  encountered  before),  and  for  n  —  0,1, 
2 . 

(4)  /»  +  i  (*)  =  Min  T  (2.  x,fn) , 

t  >  o 

where  T  (z,  x,fH)  is  the  expression  contained  within  the  brackets  in  (1). 

Let  us  now  consider  T  ( z ,  x,/0)  as  a  function  of  z,  say  Mt  ( z ). 

We  have 

(5)  Mi  (z)  =  k  +  aff  (z)  J  (jp(s)  ds  -f  a  J*/*'  (x  —  s  +  z)  <p  fs)  ds , 
and  the  second  derivative  is 

(6)  Mi"  (z)  =  fl/0*  (z)  J*~  <p  (s)  ds -ha  J*/.'  (x  —  s  +  *)  (s)  is . 

Since  /„'  >  0,  we  see  that  Mf  (2)  >  0  for  all  x  ;>  0.  Hence  the  equa¬ 
tion  Mi  (2)  =  0  has  at  most  one  root  in  z  for  any  x.  For  large  x,  it  is  clear 
that  there  will  be  no  root,  and  for  small  x,  say  x  =  0,  there  will  be  a  root 
provided  that  a,  p  and  k  are  properly  related,  a  point  we  will  check  sub¬ 
sequently.  Meanwhile,  let  us  show  that  this  root,  which  we  call  zt  ( x )  is 
monotone  decreasing  in  x. 

To  show  this  consider  the  expression  G0  (x,  2)  —  —  aff  (2)  I  q>{s)  ds  — 

rz 

a  J  /0'  (x  —  s  -+-  2)  <p  (s)  ds,  as  a  function  of  x  for  fixed  2.  Its  derivative 

with  respect  to  x  is 

8  G0  C* 

(7)  —  =  —  a  Jo  fo"  {x  —  s  +  2)  <p  (s)  ds , 

which  is  negative.  Hence,  the  family  of  curves  10  =  G0  (x,  2)  looks  as 
follows 
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This  graph  very  clearly  shows  that  z,  (x)  is  monotone  decreasing  in  x,  and 
equal  to  zero  for  x  ;>  x,. 

In  order  to  obtain  similar  results  for  the  second  approximation,  we 
must  show  that  /,'  (x)  2>  0.  We  have 

(8)  /i  (x)  =  T  (z,  (x),  x,/0),  0  <..x  <,  x„ 

=  7'(0,  x,/.),  xx<>x. 

In  [0,  x,].  we  have 

(9)  ft  (x)  =  —  aP  <P  (s)  ds  +  a  f*/,'  (x  —  s  +  z)<p  (s)  ds , 
and 

(10)  /,  (x)  =■-  ap<p  (x)  -1-  a/0'  (z)  <p  (x) 

+  (a  JX  /«'  (x  —  s  -f  z)  9?  (s)  rfs)  (1  +  dzjdx) . 

From  (9),  we  see  that  /,'  (0)  =  —  ap.  Since  /„'  (x)  is  monotone  increasing 
in  x  it  follows  that  ap  -f-  fl/o'  (2)  >  0  for  z  ;>  0.  Hence  we  will  have 
/,'  (x)  >  0  if  we  show  that  1  -f  dzx)dx  ;>  0. 

To  do  this  we  return  to  the  equation  defining  z„  namely  Mx  (z)  =  0. 
Using  the  expression  in  (5),  we  sec  that 

(11)  [a/0'  (z)  J  9 5  (s)  ds  +  a  J  /,'  (x  —  s  -f  z)  <p  (s)  <fr]  dz/dx 

+  a  J  /o'  (x  —  s  +  *)  <P  (s)  =  0, 

which  shows  anew  that  dzt/dx  <,  0  and  that  1  +  dzxjdx  ;>  0. 

We  require  finally  a  relation  connecting /„'  (x)  and  /,'  ( x ). 

We  have 

(12)  /o'  (x)  =  —  ap  J  <jr  (s)ds  +  aJ  (x  —  s)  <p  {s)  ds . 

Hence  in  [0,  x]  we  see  that  /„'  (x)  <,fx  (x),  since  /„'  ( x )  is  monotone  in¬ 
creasing  in  x.  Since  /,  (x)  —  /0  (x)  for  x  ;>  x,  we  have 

(13)  /„'  (x)  <£/,'  (x) 
for  all  x  ^  0. 

Continuing  as  in  the  preceding  pages,  we  see  that  we  obtain  a  function 
Zn  (x)  for  each  n  having  the  property  that 

(14)  (a)  zB  (x)  >  0  for  0  x  <  xn 
(b)  Zn  (x)  =  0  for  x„  <;  x. 
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and  Zn  (x)  monotone  decreasing  in  x.  Furthermore  the  sequence  x»  will 
be  monotone  decreasing  and  possess  a  limit  x. 

It  remains  to  show  that  x  =  0  if  a,  p  and  k  are  suitably  related.  This 
is  equivalent  to  checking  as  to  whether  or  not  /,  (x)  is  the  solution. 
Returning  to  (5),  we  set  x  =  0  and  examine  the  equation 

(15)  k  +  a/:  (z)  =  0 

If  k  -f-  a/t'  (0)  <  0,  there  will  be  a  solution  of  this  equation.  Turning  to 
(12),  we  see  that  /,'  (0)  =  —  ap.  Hence  we  require 

(16)  k  <  a*  p 

the  intuitive  and  expected  condition  for  a  process  involving  a  one-stage 
delay. 

§  15.  Convex  cost  function — unbounded  process 

As  another  illustration  of  the  power  of  the  method  of  successive  approx¬ 
imations,  let  us  consider  the  case  where  the  cost  of  ordering,  g  (y  —  x), 
is  a  strictly  convex  function  of  the  amount,  y  —  x,  ordered.  The  equation 
is  now 

(1)  /(*)  =  Min  \g[y  —  x)  +  a[  f  p  (s  —  y)  q>  (s)  ds  +  /  (0)  f  <p(s)ds 

y  >  x  Jv  JV 

+  j“  f  (y  —  s)  v  (s)  &]] . 

As  usual,  we  set 

(2)  /oW  =  a[  f  p{s  —  x)q>(s)ds+fo(0)  (  <p(s)  ds (Zf0(x  —  s)<p(s)ds]  , 

Jx  Jz  Jo 

and,  /or  n  —  1,  2,  . . ., 

(3)  fn  +  i  (*)  =  Min  T  (y,  x,fn) . 

y  ^  i 

Let  us  begin  with  the  consideration  of  /,  (x),  assuming  that  g  (x)  possesses 
a  continuous  derivative  for  x  ;>  0. 

If  y  >  x,  y  is  determined  by  the  equation 

(4)  g'  (y  —  x)  =  a  [p  f  <p(s)  ds  —  P  /o'  (y  —  s)  q;  (s)  rfs] . 

Jv  Jo 

Since  we  have  assumed  that  g  (x)  is  convex,  i.e.  g"  (x)  >  0,  it  follows  that 
this  equation  can  have  at  most  one  root,  since  the  left  side  is  monotone 
increasing  and  the  right-side  monotone  decreasing. 

For  x  =  0,  there  is  a  root  provided  that 

(5)  g'  (0)  <  ap . 
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For  x  large,  there  is  no  root  if  g'  (0)  >  0. 

If  y  >  x,  we  have 

(6)  ft  (*)  =  —  g'iy— x). 

and 

(7)  /,'  (*)  =  -  f  (y  -  x)  ( dy/dx  -  1) . 

To  determine  the  magnitude  of  dy/dx  —  1,  we  turn  to  (4).  This  yields 
(®)  g'iy  —  x)  (dy/dx  —  1)  =  [  —  ap  <p  (y)  —  aff  (0)  <p(y) - 

(«  J*  /.'  (y  —  s)q>  (s)  ds)  ]  dy/dx . 

From  this  we  readily  conclude  that  dy/dx  >  0  and  dy/dx  —  1  <  0. 
Hence  /,'  (x)  >  0. 

Furthermore,  we  see  that  —  /,'  ;>  —  We  now  have  all  the  details 
of  an  inductive  proof  of 

Theorem  6.  There  is  a  function  y  (x)  and  a  number  x  with  the  properties 

(9)  (a)  y  (x)  x,  y  (x)  is  monotone  increasing 

(b)  y(x)  >x,forx<,x,  y  (x)  =  *,  x^x 

(c)  x  >  Oifap  >  g'  (0) 

This  function  y  (x)  is  the  optimal  policy  in  (1). 


Appendix  Chapter  V — The  Renewal  Equation 

The  equation 

(1)  u(x)  —f(x)  +  u(x  —  s)  (f(s)  ds, 

which  occurs  in  a  great  many  different  areas  of  analysis,  is  commonly 
called  the  renewal  equation. 

There  are  two  important  methods  available  for  establishing  properties 
of  the  solutions,  the  method  of  the  Laplace  transform,  and  the  I.iouville- 
Neumann  method  of  solution  —  which  is  successive  approximations. 

The  Laplace  transform  technique  owes  its  success  to  the  fact  that 

J  u  (x  —  s)  q>  (s)  ds  is  a  convolution  having  the  formal  property  that 

(2)  f  e~tz[  f  u  (x —  5)  tp  (s)  ds]  dx  =  (  [  e~tz  u  ( x )  dx) 

Jo  Jo  Jo 

(  f  e->‘ <p(s)ds) . 
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Hence  (1)  yields,  proceeding  completely  formally, 

(3)  J  e~,x  u  (x)  dx  =  j  e~,zf(x)  dx  j  (1  —  J  e~ tz  <p  (*1  dx) , 

from  which  a  great  deal  can  be  deduced  concerning  the  asymptotic  be¬ 
havior  of  u  (x)  as  x  -*■  oo,  using  either  Tauberian  theorems  or  complex 
variable  theory,  under  appropriate  assumptions  concerning  /and  <p. 

However,  the  properties  of  most  interest  to  us  here,  positivity,  con¬ 
vexity,  et  al,  can  most  readily  be  deduced  by  considering  the  sequence  of 
approximants 

(4)  «•=/(*) 

Un+l  =  f(x)  +  J  Un  (*  - s)  <p  (s)  ds  , 

and  showing  that  each  function  un  ( x )  has  the  required  property. 

This  approach  is  justified  by  the  following  result. 

Theorem  9.  Let  us  assume  that 

(5)  a.  /  (x)  is  bounded  in  every  finite  interval  [0,  x0] 

b.  J  |  (p  (s)  |  ds  <  1 . 

Then  there  is  a  unique  solution  to  (1)  which  is  bounded  in  any  interval 
[0,  x.]. 

This  solution  may  be  obtained  as  the  limit  of  the  sequence  given  by  (4). 

If  f  (x)  is  differentiable  and  q>  (x)  is  continuous,  we  have 

(6)  «'  (*)  =  /'  (x)  +  u  (0)  <p  (x)  +  k'  (x  —  s)  <p  (s)  ds. 

Iff  (*)  ^  0.  <P  (*)  ^  0,  then  u  ( x )  ^  0. 

There  are  a  number  of  other  combinations  of  conditions  corresponding 
to  those  given  in  (5a)  and  (5b)  which  also  yield  existence  and  uniqueness. 

The  proof  of  Theorem  9  is  readily  obtained  following  the  techniques 
we  have  by  now  applied  many  times  over. 


Exercises  and  research  problems 

1.  Obtain  the  analogue  of  Theorem  3  for  the  case  where  the  distribution 
function  of  demand  varies  from  stage  to  stage. 
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2.  Consider  the  case  where  there  arc  fixed  costs  in  both  initial  purchasing 
and  the  penalty  cost  and.  the  distribution  of  demand  has  the  form  <p  (x) 
=  i/k,  o  x  <;  k,  <p  (*)  *=  o,  x  >  k. 

3.  Consider  the  process  with  a  fixed  cost  in  the  case  where  there  are  only 
two  levels  of  .demand,  high  and  low.  Can  one  generalize  the  result  ob¬ 
tained  here  to  the  case  of  an  arbitrary  finite  number  of  different  demands? 

4.  Obtain  the  analogues  of  Theorems  1,  2,  and  3  in  the  case  where  there 
is  a  storage  cost  at  each  stage  proportional  to  the  quantity  of  items  stored 
over  the  previous  stage. 

5.  Obtain  the  functional  equations  corresponding  to  the  process  in  which 
both  the  demands  and  times  of  demand  are  random.  Consider  the  cases 
where  the  times  of  demand  have  a  continuous  distribution  and  a  discrete 
distribution. 

6.  Obtain  the  analogue  of  Theorem  5  for  processes  with  arbitrary  time 
lags. 

7.  Consider  the  case  where  there  is  fixed  cost  and  determine 

a.  The  "constant  stock  level”  policy  which  minimizes  expected  cost 

b.  The  "sS”-policy  which  minimizes  expected  cost. 

8.  We  are  interested  in  producing  a  single  item  over  a  given  number  of 
time  periods  in  order  to  satisfy  known  future  demands.  We  wish  to  do 
this  in  such  a  way  as  to  minimize  costs,  knowing  the  costs  for  production, 
storage,  and  change  in  production  rate  as  functions  of  time. 

Let  us  consider  the  discrete  version  first.  Let 

T  the  number  of  periods, 
rt  —  demand  at  time  t, 

xt  —  amount  produced  in  time  interval  [f  —  1,  t], 
x0  —  given  initial  production, 

yt  =  Ai4i  —  xt  ;>  0,  the  increase  in  production  rate 
at  time  t, 

ut  —  excess  of  supply  over  demand  at  time  t. 

The  costs  ar° 

Ci  =  cost  of  producing  an  item  in  the  period  [»  —  1,  f], 
dt  =  cost  of  storing  an  item  in  excess  of  demand  for 
one  period, 

et  =  cost  of  increasing  production  rate  by  one  unit 
per  unit  of  time. 

Assume  that  we  wish  to  minimize  the  total  cost  of  the  7'-period  process 
under  the  condition  that  the  supply  must  always  exceed  the  demand. 
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9.  Consider  the  above  problem  under  the  condition  that  production 
cannot  be  expanded  in  an  arbitrary  fashion.  In  particular,  discuss  the 
two  cases 

a.  xi  <;  xt  +1  <;  axt,  1  <  a  <  oo 

b.  xt  <;  xt  + 1  <;  xt  +  b,  b  >  0. 

10.  Consider  the  case  where  the  demand  is  stochastic,  under  the  following 
two  alternative  assumptions 

a.  Demand  must  always  be  satisfied 

b.  Demand  can  be  postponed  one  stage 

11.  Obtain  the  functional  equations  corresponding  to  the  process  de¬ 
scribed  in  §  2  under  the  assumption  that  we  desire  to  minimize  the  prob¬ 
ability  that  the  cost  exceeds  a  given  quantity  c. 

12.  Consider  the  functional  equations  discussed  in  the  chapter  under  the 
assumption  that  the  distribution  function  <p  (s)  ds  is  replaced  by  the  more 
general  Stieltjes  distribution  dG  (s).  Obtain  the  requisite  existence  and 
uniqueness  theorems  and  determine  in  which  ways  the  theorems  esta¬ 
blished  above  must  be  modified  in  order  to  remain  valid. 


13.  In  what  ways  is  the  problem  of  ordering  for  a  military  .supply  depot 
different  from  the  problem  of  ordering  items  for  a  department  store  ? 

14.  Assume  that  there  is  no  penalty  for  not  being  able  to  meet  the  de¬ 
mand,  but  that  there  is  a  return  of  b  dollars  for  each  item  demanded  and 
supplied.  Suppose  that  this  return  can  be  used  to  increase  the  quantity 
available  at  the  next  stage.  Given  an  initial  stock  of  x,  and  a  supply  of 
money  equal  to  y,  how  should  one  order  so  as  to  maximize  the  total  ex¬ 
pected  return  ?  Consider  both  finite  and  infinite  processes  under  the 
assumption  of  proportional  costs. 


15.  Consider  the  equation 

/(*)  =  Max  [g  (y)  +  h  (x  —  y)  +'  T  f  (y  —  s)  k  (s)  ds] 

0  <  y  <  x  Jo 


where 


(a)  g  (0)  =  h  (0)  =  0 

(b)  S'  (y)  >  0,  h'  (y)  >  0,  g'  (0)  <  V  (0). 

(c)  k  (s)  ^  0 

(d)  g"  (y)  >  0,  h"  (y)  >  0 

(e)  h  (y)  —  g  (y)  is  monotone  increasing  in  y. 
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Show  that  the  solution  is  given  by 

/  =  h  (*).  0  <;  x  <.  x 

=  g(x)  +  f  f(x  —  s)k  (s)  ds,  x^x 

where  x  is  determined  as  the  non-zero  root  of 

*w=g  (*)  +  j*  h{x  —  s)k  ( s )  ds. 

16.  Consider  a  situation  where  one  must  order  items  to  be  sold  in  antici¬ 
pation  of  an  uncertain  demand  which  can  be  taken  as  a  known  stochastic 
variable.  Let  equal  ordering  periods  be  indexed  0, 1 , 2,  ... ,  and  the  demand 
be  described  by  a  distribution 

Ft  (x)  =  probability  that  the  demand  is  less  than  or  equal 
to  x  in  period  i. 

Let  p  be  the  unit  selling  price  and  C  (y),  assumed  differentiable,  be  the 
total  ordering  cost  for  y  units  in  any  period ;  I  be  the  inventory  at  the 
beginning  of  the  present  period  (period  0) ;  and  suppose  that  all  units 
ordered  at  the  beginning  of  the  period  are  immediately  available — units 
may  be  ordered  only  once  during  a  period  and  cannot  be  disposed  of 
except  through  sales  at  price  p  on  demand. 

Given  any  ordering  policy,  w i,  at  each  stage  there  will  be  a  cash  return 
of 

Pi  [xi,  yt,  /<)  =  p  min  (/<  -f  yt,  x()  — C  (y<). 

Let  the  purpose  of  the  process  be  to  maximize  the  expected  value  of 

[  2  a1  Pi  ( x( ,  yt,  /<)],  0  <  a  <  1 . 

i  -  0 

Show  that  the  resultant  system  of  recurrence  relations  is 

fk  (I)  =  max  [  f  [p  min  (/  +  y,  x)  —  C  (y) 
y  >  0  Jo 

+  afk  + 1  (max  (0,  /  +  y  —  x)]  dFk  (*)], 
and  solve  in  the  case  where  C  (y)  =  cy.  (Harlan  D.  Mills) 

17.  Consider  the  equation 

/  ( x )  —  Min  [ftz  +  a  f  p  (s  —  x)  <f  (s)  ds  +  a/(z)  |"  (p  (s)  ds 

Z  >  0  J x  J* 

+  «  f  /  (*  —  S  +  z)  <p  (s)  ds  ]  , 
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corresponding  to  a  one  period  lag  in  supply. 

Assuming  that  the  optimal  policy  is  to  choose  z  so  that  x  -f-  z  —  L, 
for  0  <,x  <>L,  and  z  —  0  for  x  >  L,  determine  L. 

18.  Prove  or  disprove  that  this  is  the  optimal  policy. 

19.  Examine  the  conjecture  that  in  the  general  k  period  lag  case,  the 
optimal  policy  is  to  order  nothing  if  the  sum  of  the  quantities  on  order 
and  on  hand  exceed  a  certain  quantity  L,  and  to  order  a  quantity  equal 
to  the  difference  if  L  exceeds  this  sum. 


•  Bibliography  and  Comments  for  Chapter  V 

§  1.  The  mathematical  model  of  the  inventory  problem  we  consider  here 
originated  In  the  pioneer  paper  of  K.  D.  Arrow.  T.  E.  Harris  and  J.  Marschak, 
“Optimal  Inventory  Policy,’'  Econometrica,  July.  1951.  Stimulated  by  their 
investigations,  two  further  papers  appeared  A.  Dvoretsky,  J.  Kiefer  and 
J.  Wolfowitz,  “The  Inventory  I’roblem  I,  II,”  Econometrica,  vol.  20  (1952), 
pp.  187-222. 

The  first  of  these  papers  is  devoted  to  existence  and  uniqueness  of  the 
solution  of  the  basic  functional  equation,  and  to  a  discussion  of  some  parti¬ 
cular  processes.  The  second  paper  is  more  statistical  in  nature  and  devoted 
to  the  question  of  determining  the  distribution  of  functions  of  demand  as 
the  process  continues. 

The  results  of  this  chapter  were  obtained  in  conjunction  with  I.  Glicks- 
berg  and  O.  Gross,  R.  Bellman.  I.  Glicksberg  and  O.  Gross.  "On  the  Optimal 
Inventory  Equation,"  Management  Science,  vol.  2  (1955),  pp.  83-104. 

Since  the  appearance  of  these  papers,  a  large  number  of  papers,  both 
published  and  privately  circulated,  have  appeared  on  the  topic  of  inventory 
control.  We  suggest  that  the  interested  reader  thumb  through  the  pages  of 
Econometrica,  Jour.  Soc.  Ind.  Appl.  Math  ,  Jour.  Operations  Research  Society, 
Management  Science,  and  Xaval  Quarterly  Jour,  of  Logistics,  where  he  will 
find  further  results  and  references. 

§  3.  The  results  discussed  here  are  in  accordance  with  the  remark  of  an 
earlier  chapter  that  the  derivatives  of  the  return  functions,  or  "marginal 
returns”  possess  a  simpler  structure  that  the  return  functions  in  many  cases. 

Appendix.  Further  results  concerning  renewal  equations  and  functions  of 
similar  type  may  be  found  in  W.  F'cller,  “On  the  Integral  Equation  of 
Renewal  Theory,”  Ann.  Math.  Slat.,  vol.  12  (1941),  R  Bellman,  (with  the 
collatxiration  of  J.  M.  Danskin),  "A  Survey  of  the  Theory  of  Time-Lag, 
Retarded  Control  and  Hereditary  Processes,”  RAND  Corporation,  1954, 
R-27 1 . 
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Bottleneck  Problems  in  Multi-Stage 
Production  Processes 

§  1.  Introduction 

In  this  chapter  we  shall  discuss  a  particular  class  of  significant  and 
difficult  variational  problems  arising  from  the  study  of  multi-stage  pro¬ 
duction  processes. 

We  shall  first  formulate  a  discrete  version  of  the  process,  which  under 
certain  assumptions  of  proportionality  of  output  to  input  leads  us  to  the 
problem  of  determining  the  maximum  of  a  linear  form  subject  to  linear 
constraints,  a  basic  problem  to  which  the  theory  of  linear  programming 
has  made  notable  contributions  in  recent  years.  Although  the  state  of 
analytic  research  on  this  fundamental  problem  is  still  in  its  early  stages,  a 
large  class  of  problems  arising  in  applications  can  be  successfully  resolved 
numerically,  with  the  aid  of  modem  computing  machines  and  various 
iterative  techniques  such  as  the  "simplex”  technique. 

The  study  of  bottleneck  processes,  however,  which  combine  a  moderate 
number  of  activities  at  each  stage  with  a  large  number  of  stages,  encoun¬ 
ters  the  usual  difficulty  of  dimensionality  if  conventional  computational 
methods  are  used.  As  in  the  treatment  of  the  processes  of  the  previous 
chapters,  we  can  circumvent  this  obstacle  to  some  degree  by  using  a  for¬ 
mulation  in  terms  of  functional  equations.  Since,  however,  we  are  interest¬ 
ed  in  explicit  analytic  solutions,  in  order  to  study  the  character  of  optimal 
policies,  we  shall  formulate  continuous  versions  of  processes.  It  is  worth 
emphasizing  that  the  continuous  process  may  actually  be  closer  to  re¬ 
ality  than  the  discrete  version  in  many  cases.  An  essential  weapon  in  our 
mathematical  armory  is  the  use  of  the  dual  continuous  process,  thus  ex¬ 
ploiting  the  linearity  of  the  process. 

To  illustrate  the  method,  we  shall  treat  a  simple  process  in  detail,  in 
this  chapter,  while  a  more  complicated  process  will  be  discussed  in  the 
subsequent  chapter.  In  many  cases,  these  analytic  methods,  applied  with 
faith  and  resolution,  permit  us  to  obtain  explicit  analytic  solutions  of  the 
maximization  problem,  together  with  an  explicit  description  of  the  opti¬ 
mal  policies.  Many  difficulties,  however,  remain  as  far  as  the  construction 
of  a  general  theory  is  concerned.  Examining  the  following  pages,  the 
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reader  will  quickly  see  that  the  mathematical  theory  of  these  problems  is 
in  its  rudimentary  stages. 

The  variational  problem  is  that  of  determining  the  maximum  over  the 
vector  function  z  (t)  of  the  inner  product  (x  (T),  a),  where  x  and  z  are 
connected  by  the  vector-matrix  differential  equation 

(1)  dxjdt  —  Ax  - f  Bz,  x  (0)  —  c, 

and  z  satisfies  the  constraint 

(2)  Cz  Dx , 
for  0  <;  /  <;  7\ 

The  techniques  we  employ  to  discuss  this  problem  will  be  further  de¬ 
veloped  and  applied  to  classical  problems  in  the  calculus  of  variations 
in  Chapter  9. 


§  2.  A  General  class  of  multi-stage  production  problems 

A  central  problem  in  the  theory  and  application  of  mathematical  eco¬ 
nomics  is  that  of  integrating  a  complex  of  industries,  of  similar  or  varie¬ 
gated  type,  so  as  to  produce  a  given  product  in  a  most  efficient  manner. 
Here  the  criterion  of  efficiency  may  be  minimum  time,  or  maximum 
profit,  or  some  combination  of  both. 

As  an  example,  which  is  quite  elementary  from  the  economic  point  of 
view,  but  sufficiently  advanced  from  the  mathematical  viewpoint  to 
generate  problems  which  we  cannot  resolve  as  readily  as  we  would  desire, 
let  us  consider  a  simple  model  of  a  three-industry  production  pr  icess 
where  the  individual  industries  are  the  “auto”  industry,  the  “steel”  in¬ 
dustry,  and  the  “tool”  industry.1 

In  this  highly  condensed  or  “lumped”  model  of  economic  interplay  * 
we  shall  assume  that  the  state  of  each  industry  is  completely  specified  at 
any  time  by  its  stockpile  of  raw  material  and  by  its  capacity  to  produce 
new  quantities  using  these  raw  materials.  Furthermore,  we  shall  begin  by 
assuming  that  it  is  sufficient  to  consider  that  changes  in  these  basic 
quantities,  stockpile  and  capacity,  occur  only  at  discrete  times  t  =  0, 
1,  2,  ....  T. 


1  Needless  to  say,  these  names  are  used  merely  to  guide  our  intuition.  It  is 
not  suggested  that  any  deep  significance  be  attached  to  them. 

*  This  type  of  lumping  is  precisely  analogous  to  what  is  done  in  the  study  of 
electric  circuits  in  the  low  frequency  case,  where  we  introduce  the  concepts  of 
‘'resistance”,  “inductance”  and  “capacitance”. 
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Let  us  then  define  the  following  state  variables: 

(1)  x,  (i)  =  number  of  autos  produced  up  to  time  t, 
xt  {/)  =  capacity  of  auto  factories  at  time  t, 

x ,  (t)  =  stockpile  of  steel  at  time  t, 

x4  ( t )  =  capacity  of  stfeel  mills  at  time  t, 

xt  ( t )  =  stockpile  of  tools  at  time  t, 

xt  ( t )  =  capacity  of  tool  factories  at  time  t, 

We  make  the  following  assumptions  concerning  the  interdependence  of 
these  three  industries: 

(2)  a.  An  increase  in  auto,  steel  or  tool  capacity  requires  only  steel  and 

tools; 

b.  Production  of  autos  requires  only  auto  capacity  and  steel; 

c.  Production  of  steel  requires  only  steel  capacity; 

d.  Production  of  tools  requires  only  tool  capacity  and  steel. 

The  dynamics  of  the  production  process  may  be  described  as  follows: 
At  the  beginning  of  each  unit  time  period,  say  t  to  t  -f  1,  we  allocate 
various  quantities  of  steel  and  tools,  taken  from  their  respective  stock¬ 
piles,  for  the  purposes  of  producing  autos,  steel,  and  tools  —  which  is  to 
say  increasing  the  stockpiles  of  these  quantities — and  for  the  purposes  of 
increasing  the  auto,  steel,  and  tool  capacities. 

Let,  for  i  =  1,  2 . 

(3)  a.  Zi  (t)  —  amount  of  steel  allocated  at  time  t  for  the  purpose  of 

increasing**  ( t ), 

b.  wt  (t)  —  amount  of  tools  allocated  at  time  t  for  the  purpose  of 
increasing  **  ( t ). 

Upon  referring  to  the  assumptions  in  (2)  we  see  that 

(4)  a.  2,  (t)  =  0 

b.  u>l  ( t )  —  w3  (/)  =  wt  ( t )  =  0 

In  order  to  obtain  relations  connecting  **  (f  +  1)  with  *<  ( t ),  z*  (t)  and 
Wi  ((),  we  must  make  some  further  assumptions  concerning  the  relations 
between  output  and  input.  The  simplest  assumption  to  make  is  that  we 
have  a  linear  production  process  with  output  of  an  item  always  directly 
proportional  to  the  minimum  input  of  required  resources.5  Thus,  produc- 

*  As  wc  have  observed  in  the  preface,  this  may  not  actually  be  the  simplest 
for  mathematical  purposes.  A  more  realistic  assumption  predicated  upon  a  law 
of  diminishing  returns,  involving  nonlinear  functions,  may  actually  lead  to  a 
simpler  mathematical  problem.  The  reason  for  this  is  that  nonlinear  functions 
take  more  kindly  to  a  variational  approach.  On  the  other  hand,  linear  problems 
may  be  more  readily  treated  numerically,  in  some  cases. 
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tion  is  directly  proportional  to  capacity  whenever  there  is  an  abundance 
of  raw  materials,  i.e.,  stockpile,  and  directly  proportional  to  the  minimum 
quantity  of  raw  materials  whenever  there  is  an  abundance  of  capacity. 

It  is  because  of  this  dependence  upon  the  minimum  resource  that  we 
use  the  name  "bottleneck  problems.” 

As  an  illustration,  the  increase  in  the  number  of  autos  from  t  to  t  +  1 
will  depend  upon  the  capacity  of  auto  factories  at  t,  xx  (/),  and  the  quan¬ 
tity  zt  (/)  of  steel,  as  defined  above  in  (3a).  Since  production  depends  upon 
the  minimum  of  capacity  and  supply  of  raw  material,  we  obtain  the 
equation 

(5)  Xt  ((  +  1)  =  x ,  (0  +  Min  (y,  xt  (<),  a,  z,  (*)) , 

where  y,  and  are  taken  to  be  known  positive  constants. 

In  a  similar  fashion,  combining  the  assumptions  in  with  those  of  the 
previous  paragraph,  we  obtain  the  following  equations  which  relate 
Xi  ( t  +  1)  to  xt  (0,  2<  (t).  and  wt  ( t )  :4 

(6)  xx  (t  -f  1)  =  *,  (/)  +  Min  (yt  xt  ( t ),  a,  zx  ( t )) 

x%  (t  +  1)  =  *,  (t)  -f  Min  (a,  z.  (t),  pt  wt  (/)) 

*»  (<  +  !)  =  (t)  —  z,  (t)  —  zt  (<)  —  z4  ( t )  —  zs  (t)  —  z,  (/)  +  Yt  *«  W 

Xt  (t  +  1)  =  *«  (t)  +  Min  (a4  z4  (t),  0,  wt  (t)) 

xt  (t  +  1)  =  xt  (i)  —  wt  (f)  —  wt  (t)  —  w,  (t)  +  Min  [y4  xt  ((),  a,  z5  (*)] 
xt  (t  +  1)  =  *,  (f)  +  Min  (a4  z4  (f),  P»  w,  (t)), 

where  at,  fit,  and  y<  are  constants. 

The  constraints  upon  zt  and  wt  are  obviously 

(7)  (a)  zt,  wt  ^  0 

(b)  Zi  +  z,  -f  Z4  +  Z„  -f  Z.  ^  *s 

(c)  wt  -f  wt  +  w,  ^  X, 

together  with  the  "common-sense”  constraints 


(a) 

<*i  z i  ^  yi  *. 

(c) 

at  zx  =  pt  k'i 

lb) 

at  Zt  -=  Pt  u>t 

(d) 

Zt  <.  Yt  *« 

(e) 

a  4  z,  ==  Pt  Wt 

4  All  these  equations  are  conservation  equations  which  state  that  the  quantity 
of  an  item  at  time  t  +  1  is  the  quantity  at  time  /,  minus  the  quantity  used  over 
[/,  t  -f-  1],  plus  the  quantity  produced  over  [I.  t  -f  1], 
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The  meaning  of  these  equations  is  that  there  is  no  advantage  to  any 
allocation  beyond  the  capacity  of  production,  and  again  that  the  mini¬ 
mum  resource  determines  the  production  level. 

By  means  of  these  additional  constraints  we  may  eliminate  the  va¬ 
riables  wi  completely,  obtaining  in  place  of  (4.6)  the  system  of  equations: 

(<+!)  =  *1  M  +  a,  z,  (/),  xt  (0)  =  ct. 
x,  (t  -f  1)  =  xt  (<)  -f-  a,  2,  (/).  xt  (0)  =  c„ 

*.  (f  +  1)  =  a:,  (f)  —  z,  (/)  —  z,  (f)  —  z4  (<)  —  z,  (0  —  z,  (t) 

(9)  +  *»(0)  =  c», 

*4  (<  +  1)  =  *4  (t)  +  a4  z4  (f),  xt  (0)  =  c4 , 

x$  (t  +  1)  =  X,  (i)  —  e,  z4  (t)  —  e4  z4  (<)  —  e4  z4  (l)  +  a,  z4  (/) , 

£<  =  ailPi,  xt  (0)  =  ct , 
x,  (t+  1)  =  *,  (/)  -f  a,  z4  (0,  *4  (0)  =  c4. 

The  constraints,  in  turn,  have  the  form,  for  each  t : 

(a)  Zi  ;>  0 

(b)  Zt  +  z,  -f  z4  +  z4  +  z4  ^  xt 

(10)  (c)  yt  z,  -(-  yt  z4  +  y,  z,  <1  x4 

(d)  z4  <^/4  x» 

(e)  z,  ^/4*«. 

We  must  now  choose  the  zi(t)  for  /  =  0,  1,  2,  . . .,  T  — •  1,  subject  to 
the  above  constraints,  so  as  to  maximize  a,  (T). 

§  3.  Discussion  of  the  preceding  model 

It  is  easy  to  see  that  xt  (T),  the  total  number  of  autos  produced  over 
the  time  period  [0,  T],  may  be  written  as  a  linear  expression  in  the  quan¬ 
tities  zi  (/),  t  =  0,  1,2,  . ...  T  —  1 ,  *  =  1,2,  . . .,  6.  The  problem  of 
maximizing  xt  (T)  subject  to  the  linear  constraints  of  (2.10)  is  conse¬ 
quently  within  the  domain  of  lineal  programming.  It  may  be  solved 
computationally  for  explicit  values  of  the  coefficients  and  the  time  T,  by 
iterative  processes  of  various  types,  provided  that  T  is  not  too  large.  In 
particular,  for  dynamic  processes  of  the  kind  Considered  here,  a  number 
of  important  simplifications  are  possible. 

However,  in  general,  in  analyses  of  the  type  presented  here,  we  are  not 
so  much  interested  in  the  numerical  solution  corresponding  to  any  par¬ 
ticular  set  of  constants  as  we  are  in  the  complete  set  of  numerical  values 
obtained  from  a  range  of  parameter  values.  In  other  words,  in  most  cases 
the  whole  interest  of  the  investigation  lies  in  a  “sensitivity  analysis,"  or 
equivalently  a  “stability  analysis,”  of  the  solution. 
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This  sensitivity  analysis  is  required  because  of  the  many  assumptions 
we  have  made  such  as  linearity  of  output,  the  crude  description  of  indus¬ 
tries  in  terms  of  lumped  capacities  and  stockpiles  we  have  employed,  the 
absence  of  time  lag  or  "lead  time”  in  production,  and  so  on.  Any  conclu¬ 
sions  concerning  the  structure  of  optimal  policies  that  may  be  drawn 
from  the  simplified  mathematical  model  can  have  validity  only  if  these 
conclusions  are  relatively  insensitive  to  the  precise  values  of  various  para¬ 
meters  which  occur. 

It  is  clear  from  what  we  have  said  above  that  the  numerical  work 
involved  in  performing  any  reliable  sensitivity  analysis  using  purely 
computational  techniques,  involving  as  it  does  a  probing  of  many-dimen¬ 
sional  space,  will  be  tedious,  time  consuming,  and  inevitably  incomplete. 

The  question  arises  then  as  to  whether  or  not  it  is  possible  to  determine 
the  intrinsic  structure  of  an  optimal  policy,  regardless  of  any  numerical 
values  we  may  subsequently  assign  to  the  parameters.  This  knowledge  is 
not  only  of  importance  in  itself,  in  allowing  us  to  make  a  complete  sensi¬ 
tivity  analysis  of  the  solution,  but  is  also  extremely  helpful  in  determining 
approximate  solutions  in  cases  where  explicit  analysis  seems  hopeless, 
and  in  furnishing  analytic  clues  to  the  solution  of  more  complicated  pro¬ 
cesses. 

As  a  first  step  towards  obtaining  the  solution,  both  analytically  and 
computationally,  we  shall  reformulate  the  problem  in  terms  of  functional 
equations. 


§  4.  Functional  equations 

It  is  clear  that  the  total  output  of  cars  obtained  using  an  optimal  allo¬ 
cation  policy  is  a  function  only  of  the  initial  resources,  c, . c„  and 

the  duration  of  the  process,  T.  Furthermore,  c,  need  not  be  explicitly 
mentioned. 

Let  us  then  define  for  T  —  1,2,... 

(1)  /  (ct,  c j,  . . . ,  c, ;  T)  =  The  total  output  obtained  over  a  time  interval 

T  starting  with  initial  resources  a,  i  = 
2,  3,  ....  6,  and  employing  an  optimal  policy 

Employing  the  principle  of  optimality,  we  obtain  the  following  functional 
equation  for  /  (c„  c . ,ct;T): 

(2)  /  (c„  c, . c„;  T  +  1)  =  Max  [a,  z,  +/(ct',  c,' . c,';  T)] , 

z 

where 
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c%  =  ca  -f-  a,  zt 

ca  —  c,  z,  —  z,  —  z4  —  z»  —  z4  +  y4  e4 

(3)  c4  =  c4  - f-  a4  z4 

C4  =  C4  £,  Z,  - f4  Z4  - £,  z4  -f-  at  z, 

C,  —  ct  ~h  Of  *t> 

and  Z  denotes  the  region  in  the  (z,t  z,,  z4,  z4,  z4)-space  defined  by  the 
following  inequalities 

(a)  z,  ^  0 

(b)  Z!  +  zt  +  z4  +  zt  +  zt^,  ct 

(4)  (c)  yt  z,  +  y4  z4  +  y « 2.  ^  c„ 

(d)  2j  ^  /*  ct 

(e)  2,  ^/,  c, 

The  analytic  problem  of  determining  /,  and,  more  importantly,  the 
nature  of  the  optimal  policy  is  still  one  of  great  difficulty.  The  computa¬ 
tional  problem  is  also  formidable  involving  as  it  does  the  tabulation  of  a 
function  of  five  variables  for  each  value  of  T.  The  homogeneity  of  the 
process  enables  us  to  reduce  this  to  a  problem  involving  four  variables. 
We  shall  refer  to  this  fact  again  in  following  sections. 

The  computational  problem  involved  in  determining  the  maximum 
over  the  region  Z,  a  polyhedral  region  bounded  by  planes,  may  be  greatly 
simplified  by  observing  that  the  maximum  occurs  at  vertices. 

§  5.  A  Continuous  version  * 

To  simplify  the  analytic  problem,  we  shall  transform  the  discrete 
process  into  a  continuous  process.  In  so  doing,  our  purpose  is  to  avail 
ourselves  of  the  combination  of  the  powerfu'  methods  of  calculus,  to¬ 
gether  with  the  resources  of  linear  algebra.  It  is  very  often  true,  in  dealing 
with  the  physical  world,  that  continuous  models  are  far  simpler  to  discuss 
than  discrete  models. 

To  obtain  a  continuous  version,  we  assume  that  decisions  are  made  at 
times  0,  At,  2  At,  and  so  on,  and  that  the  allocations  z<  (/),  wt  ( t )  previously 
made  over  the  time  interval  [/,  /  +  1]  are  replaced  by  allocations  z<  ( t )  At, 
Wi  (t)  At  over  the  interval  [/,  t  -b  At].  The  quantities  z<  (t)  and  wt  (t) 
are  now  rates  of  allocation  of  resources. 

Turning  to  the  equations  in  (2.9)  describing  the  discrete  process  and 
allowing  At  to  approach  zero,  the  new  equations  take  the  form 

*  Chapter  Vlli  is  devoted  to  a  similar  continuous  version  of  the  discrete  process 
of  Chapter  II. 
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i,  (t)  =  a,  z,  (/).  x,  (0)  -=  c, , 
xt(t)  =  a,  z*  (t),  x,  (0)  =  ctf 

it  W  =  —  *.  0)  —  2,  W  —  z*  (0  —  z%  (/)  —  z,  (0  +  yt  x4  (l) , 

(1)  x,  (0)  =  c, . 

*«(<)  =  a4  z4  (t),  x4  (0)  =  c4, 

M  =  2t  (0  2«  (0  —  zt  (l)  +  a4  z,  (/),  xt  (0)  —  ?4 , 

*.  W  =  a.  z4  (/),  x4  (0)  =  ct. 

(*  signifies  differentiation  with  respect  to  t). 

The  constraints  upon  the  z«  are  now 

(a)  z<  ^  0 

(b)  z,  4-  zt  4-  z4  +  **  4-  zt  ^  oo 

(2)  (c)  yt  z,  +  y4  z4  4-  y4  z4  ^  oo 

(d)  z  |  ^  /»  x* 

(e)  zt<,f4x% 

This  means  that  the  constraints  of  (2b)  and  (2c)  disappear.  Two  con¬ 
ditions  which  were  automatically  satisfied  before  must  now  be  added. 
These  are  the  conditions  that  the  stockpiles  be  non-negative  at  all  times, 

(3)  (b')  x,  0 

(c')  x,  ^  0, 

From  these  constraints  we  see  that  whenever  x,  =  0  we  must  have 

(4)  2,  +  2,  +  2*  +  2&  +  Z4  ^  y»  *« 

and  similarly  when  x8  =  0  we  must  have 

(5)  et  z,  -f  et  z4  -f  e,  z4  ^  as  z5 

It  follows  that  zlP  zs,  z4,  and  ze  are  unbounded  whenever  x3  and  x5  are 
positive.  This  means  that  delta-function  type  solutions  may  occur.  This 
point  will  be  discussed  in  more  detail  in  the  subsequent  chapter  where  an 
example  involving  this  type  of  solution  is  discussed.  However,  a  rigorous 
discussion  of  this  feature  of  a  solution  will  be  postponed  until  the  second 
volume.  We  shall  proceed  essentially  in  a  formal  manner  in  this  and  the 
following  chapter  at  various  points  where  a  rigorous  discussion  would  take 
us  too  far  afield.* 

•  It  is  important  to  point  out  that  the  continuous  process  is  described  by  the 
above  equations.  A  detailed  discussion  of  this  point  is  given  in  Chapter  VIII, 
where  we  also  discuss  the  connection  between  the  discrete  and  continuous  processes. 
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The  problem  is  now  to  maximize  x,  (7")  subject  to  the  above  constraints. 

After  some  discussion  of  notation,  we  shall  approach  this  problem 
using  the  functional  equation  approach  of  dynamic  programming. 

§  6.  Notation 

Let  us  introduce  vector-matrix  notation  which  will  greatly  simplify 
the  notation  and  thus  be  of  considerable  help  in  presenting  the  general 
theoretical  approach,  unclouded  by  a  superabundance  of  superscripts. 
Following  the  discussion  of  the  basic  concepts,  we  shall  consider  a  par¬ 
ticular  example,  to  illustrate  the  analytic  minutiae,  which  are  not  trivial. 

Let  x  (l),  z  ( t ),  and  c  denote  respectively  the  n-dimensional  column 
vectors 


and  Ai,  B},  for  such  values  of  »  and  /  as  occur,  denote  n  x  m  matrices. 

We  shall  be  dealing  only  with  vectors  x  and  z  whose  components  are 
non-negative.  To  indicate  this  fact  we  use  the  notation  x  ^  0  to  denote 
the  relations  x<  2>  0,  i  =  1,2,  . . n.  The  inequality  x  ;>  y  is  equivalent 
to  x  —  y  ^  0. 

Turning  to  the  equation  in  (5.1),  we  see  that  it  may  be  written 

(1)  dx/dl  =  AlX  +  Atz,  x(0)  ==  c 

where  Ax  and  At  are  matrices  determined  by  the  coefficients  in  (5.1). 
Similarly,  the  constraints  in  (5.2) — (5.5)  take  the  form 

(2)  z  ^  0 

Bx  z  <,  Bt  x 

The  problem  of  maximizing  x,  ( T )  is  a  particular  case  of  the  problem  of 

n 

maximizing  a  linear  combination,  2’  ci  x<  (7').  To  express  this  in  simple 

i  -  1 

form,  we  introduce  the  inner  product  of  two  vectors  x  and  y,  namely 

(3)  {x,  y)  =  I  Xi  yt 

i  -  1 

The  general  prov  n  is  then  that  of  choosing  r  (<)  so  as  to  maximize 
(x  ( T ),  a)  where  a  is  a  given  vector,  subject  to  the  relations  giveri  above  in 
(1)  and  (2). 
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One  of  the  difficulties  that  arises  in  the  continuous  case,  and  not  in  the 
discrete  process,  is  that  this  maximum  may  not  exist  if  we  restrict  z  (/)  to 
be  a  function  in  the  usual  sense.  We  shall  proceed  on  the  assumption  that 
the  constraints  in  (4.18)  are  sufficient  to  ensure  the  existence  of  a  maxi¬ 
mizing  function.  This  will  be  the  case  if  (4.18)  has  the  form  z  <;  Btx, 
where  Bt  is  a  positive  matrix.  A  complete  treatment  will  require  the  use 
of  Stieltjes  integrals. 

§  7.  Dynamic  programming  formulation 

Since  the  forms  of  the  equations  in  (4.17)  and  (4.18)  are  time-indepen- 
dent,  it  follows  that  Max  ( x  (T),  a)  (where  we  shall  assume  throughout 

X 

the  remainder  of  this  expository  chapter  that  the  maximum  actually 
exists)  is  a  function  only  of  T  and  the  components  of  c,  which  is  to  say,  of 
the  initial  stockpiles  and  capacities,  the  state  variables  and  the  duration 
of  the  process. 

Let  us  then  write 

(1)  Max  (x  (T),  a)  =f(c,  T)  =  /(c„  ct . c„;  T) . 

X 

§  8.  The  basic  functional  equation 

We  shall  now  derive  a  functional  equation  for  /  using  the  Principle  of 
Optimality  7 ,  which  in  this  case  states  that  the  nature  of  any  optimal 
allocation  policy  over  the  interval  [0,  T],  which  is  to  say,  one  which  yields 
the  maximum  of  (x  (T),  a),  is  such  that  its  continuation  over  any  final 
sub-interval  [S,  T]  must  be  an  optimal  policy  for  a  process  of  duration 
T  —  S  starting  from  the  initial  state  c  ( S ). 

Here  c  (S)  is  the  vector  x  (S)  obtained  from  (6.1)  using  an  allocation 
policy  over  [0,  5]. 

The  mathematical  transliteration  of  the  verbal  principle  yields  the 
functional  equation 

(1)  f  (c,  S  +  T)  —  f  (c  (S) ,  T) 

for  an  optimal  policy  over  [0,  S  -f-  T]. 

It  follows  that  the  policy  over  [0,  5]  is  determined  by  the  equation 

(2)  f(c,S  +  T)  =  Max/(c(5),  T) , 

[0.  .S’) 

where  we  maximize  over  all  feasible  policies  over  [0,  S],  that  is  to  say, 
over  all  z  (<)  satisfying  the  constraints. 

Equation  (2),  together  with  the  initial  condition / (c,  0)  =  (c,  a),  is  the 
basic  functional  equation  governing  the  process. 

7  Chapter  III,  §  3. 
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§  0.  The  resultant  nonlinear  partial  differential  equation 

Let  us  now  use  the  basic  equation  in  (8.2)  to  derive  a  partial  differential 
equation  for  /,  on  the  assumption  that  /and  x  possess  the  requisite  differ¬ 
entiability  properties.  As  we  shall  see  below,  it  is  quite  permissible  to 
proceed  formally  at  this  point  since  we  shall  derive  a  technique  for  veri¬ 
fying  the  validity  of  any  proposed  solution. 

Let  us  take  S  to  be  an  infinitesimal.  Then  we  have 

<1)  (a)  f(c.  S  +  T)  =  /(c,  T)  +  S/T  +  o  (S), 

(b)  c  (S)  =  c  -f  5  [A ,  c  -f  A,  z  (0)]  +  o  ( S ), 

(c)  /(c  (S),  T)  =/(c  +  S  [A,  c  +  At  z  (0)],  T)  +  °  (S) 

=  /(c.  T)  +  S((A1c  +  Atz(0),ei/)  +  o(S), 

c  c 


where  8  fjd  c  denotes  the  vector 


As  5  shrinks  to  0,  the  maximum  over  the  interval  [0,  S]  shrinks  to  a 
maximum  at  S  =  0,  or  a  maximum  over  z  (0),  under  our  assumptions  of 
continuity.  With  reference  to  the  expansions  in  (1)  above,  we  see  that  the 
infinitesimal  analogue  of  (1)  is  the  nonlinear  partial  differential  equation 

(3)  8/1 8T  =  Max  ^Axc  +  Atz(0), 

where  z  (0)  is  constrained  by  the  equations  in  (6.2). 

§  10.  Application  of  the  partial  differential  equation 

The  importance  of  the  equation  in  (9.3)  resides  in  the  fact  that  it  per¬ 
mits  us  to  determine  the  solution  over  [0,  T  ~f-  A  T]  if  the  solution  has 
already  been  determined  over  [0,  T]  for  all  initial  states. 

It  turns  out  to  be  tiue  that  in  many  of  these  problems  the  difficulties 
are  readily  resolved  for  small  T,  since  for  processes  of  short  duration,  the 
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obvious,  crude,  policies  are  optimal.  It  follows  then  that  we  have,  in 
theory,  a  systematic  means  of  continuing  the  solution  up  to  any  desired 
value  of  T.  Although  systematic,  the  details  are  by  no  means  trivial,  as 
we  shall  see  in  the  next  chapter. 

In  the  next  section,  we  shall  go  through  the  analysis  involved  in  re¬ 
solving  a  relatively  simple  problem.  Much  of  the  analysis  can  be  discarded, 
once  we  have  ascertained  the  structure  of  the  solution,  which  in  many 
cases  is  plausible  on  economic  grounds. 


§  11.  A  Particular  example 

As  an  application  of  the  general  approach  presented  above,  let  us  now 
consider  the  problem  of  maximizing  xt  ( T ),  where 

(1)  dxjdt  =  atzu  x»(0)  =  clt 

dxt/dt  =  a,  z,  —  z„  xt  (0)  =  c, , 


and  z„,  zt,  the  rates  of  allocation,  as  functions  of  t  are  subject  to  the 
following  constraints : 


(2) 


(a)  z„  z,^  0, 

(b)  z, -!- z, 

(c)  z,  ^  xx , 

(d)  *t^0. 


for  o  <;  t  <;  T. 

In  this  case,  the  rates  z,  and  zt  are  uniformly  bounded,  and  it  is  easy  to 
see,  using  either  a  direct  weak  convergence  argument,  or  relying  upon 
classical  theorems  in  the  calculus  cf  variations,  that  the  maximum  is 
assumed.  Hence  we  may  set  in  rigorous  fashion, 


(3)  /(c„  c„  T)  =  Max  x,  (T) . 

|0,  T] 

As  in  the  general  case,  /  satisfies  the  functional  equation 

(4)  f(clt  ct,  S  +  r)  =  Max  f(Xl  (S),  xt  (S),  T), 

10,  S] 


which,  in  the  limit  as  5  ->  0,  yields  the  partial  differential  equation 

ef  r  df  e/~\ 

<5)  «'r  “  h  2‘  sc\  +  2-  -  2‘>  sc,\  • 


which,  at  the  moment,  we  recall,  is  purely  formal,  since  we  do  not  know 
whether  or  not  /  has  the  requisite  continuity  properties. 
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The  maximum  is  taken  over  the  region  defined  by 


(6) 


(a)  0^z„zi( 

(b)  2,  +  Z.^C,, 

(c)  z,  ^  c, , 


with  the  additional  constraint 


(7)  ax  zt  z,  ^  0, 

if  xx  =  0.  The  variables  are  now  zx  =  z,  (0),  zt  =  z,  (0). 

Let  us  now  sketch  the  analytic  procedure  that  will  yield  a  solution.  We 
begin  with  the  most  complicated  case,  that  where  ct  <  c,.  For  a  process 
of  short  duration,  the  solution  is  trivial.  We  have 


(8) 


z,  =  0,  z,  =  xt, 
f  =  ct  . 


This  policy  is  pursued  until  a  "bottleneck”  develops,  which  is  to  say, 
cx  exceeds  c,.  Using  the  optimal  policy  described  in  (8)  we  see  that  this 
situation  will  occur  as  soon  as  T  exceeds  Tx  =  log  (c,/ct)/at. 

To  obtain  the  solution  for  T  >  Tu  we  rewrite  (5)  in  the  form 


The  location  of  the  maximizing  point  (z,  (0),  z,  (0))  will  depend  upon  the 
sign  and  magnitude  of  the  coefficients  of  zx  and  z,.  For  T  <  Tx  we  have 


(10) 


Ul  dcx  dct 


df 

—  ea>T,  at  =  a,  e"*7”. 

(/Cf 


Using  our  assumption  concerning  the  continuity  of  &fldcx,  df\dcx,  we 
suspect  that  the  solution  for  T  slightly  greater  than  Tx  will  have  the 
form 


(11)  (a)  z,  =  0,  z,  =  xt  for  0^  S  ^  Tx 

(b)  zx  =  0,  zt  =  xx  for  Tx  <  S  <,  T. 


Applying  this  policy,  /  takes  the  form 

(12)  f=bicl  +  (T  —  Tx)atbtcl, 

where  Tx  is  as  above.  In  order  to  determine  how  long  this  policy  endures 
when  T  >  Tx,  we  consider  the  process  as  starting  from  S  =  Tx.  In  terms 
of  cx  =  cx  (Tx),  ct'  =  c,  (Tx),  f  has  the  form 

(13)  f=c^  +  atcx'(t  —  Tx) 
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The  equation  which  replaces  (9)  has  precisely  the  same  form  with  ct,  c% 
V,  namely 


replaced 

by  cx,  Ct. 

namely 

(14) 

II 

Max  z 
*<T ,)  L 

We  have 

.  using  (13) 

i. 

(15) 

V 

Bcx’ 

V 

ai 

dcx' 

=  a. 


The  coefficient  of  zx  is  negative  for  T  <  T*  =  Tx  +  1/a,  at,  0  at  T*,  and 
positive  thereafter. 

It  follows  that  the  new  policy  given  by  (11)  remains  optimal  for  T,  < 
T<,T *. 

Furthermore,  since  T*  —  Tx  is  independent  of  c,  and  ct,  we  see  that 
we  know  the  form  of  the  optimal  policy  over  a  tail  interval. 

It  remains  to  determine  what  the  policy  is  in  the  middle  of  the  interval 
[0,  T]  in  the  general  case  when  T  exceeds  T*.  We  suspect  from  an  exami¬ 
nation  of  the  vertices  in  the  figure  below  that  it  has  the  form 

(16)  2.  =  *«  —  xx,z,  =  *i. 

It  is  instructive  to  consider  the  region  determined  by  the  constraints 
in  (6)  when  c,  >  cx 


Fig  ire  1 


When  maximizing  ovei  z,  the  three  crucial  points  are  the  vertices  P, 
Q  and  R,  where  P  —  P  (0,  c,),  Q  =  (c,  —  c„  c,),  R  =  (c„  0).  It  is  the 
principle  of  continuity  which  leads  us  to  choose  Q  as  the  maximizing  vertex 
as  s»./on  as  c,  surpasses  c,. 
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Instead  of  verifying  this  directly,  which  may  be  done,  we  shall  describe 
in  the  next  section  a  more  elegant  technique  which  exploits  the  linearity 
of  the  process.  This  technique  serves  not  only  as  a  means  of  verifying 
proposed  solutions,  but  also  as  a  theoretical  tool  for  the  determination  of 
the  nature  of  optimal  policies. 

§  12.  A  Dual  problem 

Let  us,  to  illustrate  the  principles  we  shall  employ,  take  our  basic 
equation  to  have  the  form 

(1)  dx/dl  =  Az,  2(0)  —  c, 
with  constraints  of  the  form 

(2)  (a)  2^0 

lb)  Bz  <;  x 
(c)  x^O. 

Note  that  the  equation  in  (6.1)  may  always  be  written  in  the  form  of 
(1),  if  Ay  ]>  0,  by  first  writing  it  in  the  form 

(3)  dxjdt  =  A ,  w  -(-  A 1 2,  x  (0)  =  c , 
with  the  constraints 


(4)  (a)  2^0 

(b)  Bz<.  x 

(C)  71’  ^  X  . 

and  then  combining  the  vectors  w  and  z  into  one.  However,  an  equation 
of  the  type  appearing  in  (6.1)  may  also  be  treated  directly  by  these 
methods. 

Since  x  =  c  J*  Azdt,  the  constraint  of  (2b)  may  be  written 
(6)  Bz  +  (  Czdt  c,  (C  =  —  A) 


The  problem  of  maximizing  (x  ( T ),  a)  is  equivalent  to  that  of  maximizing 
rr  rr 

( Az,a)dt=  I  (z,a')dt,  where  a  =  A  'a.  Here  A  'denotes  the  transpose 

JO  JO 


of  A. 

Beginning  all  over  again,  we  start  with  the  problem  of  maximizing 
J  -  F.  (z,  a')  dt  over  all  z  satisfying  the  constraints 
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(a)  z  ^  0 


(b)  Bz  +  j  Czdtt  c 


Let  w  (l)  be  a  non-negative  vector  of  the  same  dimension  as  c.  Then  by 
virtue  of  (6b)  we  have 

(7)  f  (w,  Bz  +  f  Czdtt)  dt  <;  I  ( w ,  c)  dt 

Jo  Jo  Jo 

Let,  as  above,  B'  denote  the  transpose  of  B.  Then,  as  is  easily  seen, 
(Bz,  w)  =  ( z ,  B'  w).  Integration  by  parts  yields,  for  any  constant  matrix 
C. 

(8)  r  (w,  [‘  Czdt,)  dt  —  fT  (  \TC'wdtltz)dt 

Jo  Jo  Jo  Jt 

Combining  these  two  results,  we  have 

(9)  f  (u\  Bz  -f  f  Czdti)  dt  =  f  (B'w  -f  f  C'wat ,,  z)  dt 

Jo  Jo  Jo  J  t 

Let  us  now  assume  that  it  is  possible  to  find  a  vector  w  =  w  ( t )  which 
is  non-negative  and  satisfies  the  inequality 

(10)  B’w  +  C'u-dt,  ^  a' 

We  then  have  the  chain  of  equalities  and  inequalities: 

(11)  f  (w,  c)  dt  f  (w.Bz  -f  f  Crrf/,)  dt 

Jo  Jo  Jo 

=  fT  (B'w  +  r  C'wdt,  z)  dt  ^  r  (a,  z)  dt 
Jo  J  t  Jo 

From  this  it  is  clear  that 


Inf  I  (w,  c)  dt  >  Sup  I  (z,  a')  dt 
W  J u  Z  J  c 


where  the  infimum  and  supremum  are  taken  over  all  w  and  z  satisfying 
the  inequalities  of  (10)  and  (6b).  If  the  minimum  and  maximum  are  as¬ 
sumed,  the  details  are  as  above.  If,  however,  the  minimum  and  maximum 
are  not  assumed,  then  delta-functions  will  occur,  which  is  to  say,  we  must 
reformulate  the  problems  in  terms  of  Stieltjes  integrals.  A  number  of 
.interesting  and  difficult  problems  arise  in  this  way,  which  we  shall  not 
discuss  here." 
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If  the  two  extremes  in  (12)  are  equal,  we  see  that  the  following  rela¬ 
tions  must  hold  • 

(13)  wt  =  0  if  ct  >  ( Bz  +  J  Czdt)t 

Zf  =  0  if  a/  <  ( B '  w  +  j  C'wdt)t 

The  important  fact  which  we  now  wish  to  establish  is  that,  conversely, 
any  pair  of  non-negative  z  and  w  satisfying  (13)  and  the  original  con¬ 
straints  will  furnish  solutions  to  the  maximum  and  minimum  problems. 

To  demonstrate  this,  let  us  note  that  if  (13)  holds,  all  the  relations  in 
(12)  are  equalities.  Assume  now  that  z  is  another  vector  satisfying  all  the 
constraints  and  for  which 

(14)  JT  (z,  a)  dt  <  J1"  (z,  a')  dt 
Then  with  the  w  associated  with  z  we  have 

(15)  f  (z,a')dl<,  f  (r,  B'w  -f-  f  A'udtt)  dt 

Jo  Jo  J  t 

—  f  (Bz  - f-  f  Azdtx,w)dt<,  f  (c,K')dt 

Jo  Jo  Jo 

=  J  (z,  a')  dt , 

a  contradiction. 

It  follows  then  that  we  have  a  procedure  for  verifying  a  conjectured 
solution,  (iiven  r,  we  seek  to  determine  jc  by  means  of  (13).  Having  ob¬ 
tained  w,  wc  test  to  see  whether  or  not  w  satisfies  the  given  constraints.  In 
the  next  section  we  shall  carry  through  the  details  for  the  problem  of  §  10. 
This  procedure  will  encounter  difficulties  if  w  is  not  uniquely  determined 
by  (13).  In  this  case,  various  alternative  solutions  must  be  considered. 


•  In  particular,  wc  shall  not  discuss  the  connection  with  a  min-max  result  in 
the  theory  of  games,  a  result  corresponding  to  known  results  for  the  discrete 
problem. 

•  Apart  from  sets  of  measure  zero. 
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§  IS.  Verification  of  the  solution  given  in  §  11 

Applying  the  techniques  described  above,  we  find  that  the  dual  of  the 
problem  posed  in  §  10  is  the  problem  of  minimizing  j  (c,  wx  +  ct  wt)  dt 
over  wx  { t )  and  wt  (/),  where  y  and  w  are  connected  by  the  equations. 


(1)  dyxjdt  =  —  axwx  +  wx,  yx(T)  =  1, 
dyt\dt  =  —  a%  wt,  y,  (T)  =  at  10, 

and  the  constraints  have  the  form 

(2)  (a)  wx,  wt  ^  0 

(b)  wx  -f  u>t  ^  y*. 

(c)  wt  '•2;  >, 

*■ 

The  equations  of  (12.13)  are  now: 

If 


(3) 

(a) 

2»  <  *». 

then  wx  =  0 

(b) 

Zx  +  Zx  <  x„ 

then  wt  =  0 

(c) 

wt  >  y„ 

then  z,  =  0 

Id) 

wx  +  u't  >  yt,  then  zt  —  0 

We  have  omitted  the  conditions  corresponding  to  x,  ;>  0  since  we 
suspect  that  the  proposed  optimal  allocation  policy  automatically  keeps 
xt  ^  0.  This  is  actually  the  case. 

We  wish  to  verify  that  the  policy  which  maximizes  x  (T)  is  the  follow¬ 
ing: 


(4) 


(a)  For  T  —  1/a,  a,  <  t  <[  T, 

(b)  For  0  t  <,  T  —  1/a,  a,, 


z,  (/)  =  0,  z,  =  Min  (x„  .v,) 

(1)  ifx,  z,  =  0,  z,  =  xt 

(2)  if  xx  ^  x„  z,  =  xt  —  x„  z,  =  x. 


It  is  easily  seen  that  this  is  a  permissible  policy  in  that  z,  =  x,  —  x,  is 
actually  non-negative  when  z,  and  zt  have  the  above  values. 

Having  prescribed  z,  we  can  determine  it'  using  (3)  and  then  test  for 
consistency.  There  are  two  cases  to  consider,  depending  upon  whether  xt 
ever  exceeds  x,  or  not. 

Let  us  assume  then  that  T  ]>  Tx,  in  which  case  x,  can  exceed  xx  if 
appropriate  policies  are  used. 


10  Observe  that  the  dual  process  procc  -ds  backwards  in  time. 
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Case  I:  7  —  1/a,  a,  <  7,  <  7.  The  solution  is  given  by 

(5)  for  t  <  7, :  z,  =  0,  zt  =  x, 

for  t  ;>  7', :  z,  =  0,  z,  =  *, 

For  /  <  7,  these  results  yield,  in  conjunction  with  (3) 

(6)  for  7,  <  7:  wx  (<)  -  0,  wt  (/)  =  y,  (/) 
for  T,  ^  T:w,  (t)  =  0,  tt>,  (/)  =  y,  (f) 

For  f  >  7,  we  obtain,  using  (1) 

(7)  yt  (0  =  ^t.  yi  (0  =  —  1  +  at(T  —  t)  <0 

while  for  f  <  7,,  we  have 

(8)  y«  (<)  =  at  ea>^Ti  -f>  >  0 

y,  (/)  =  —  1  -+•  a,  a,  (7  —  7,)  —  <:“•  <Ti  -0  <  0 

Hence,  the  inequalities  ze,,  wt  ;>  0 ,  wt^>  y,,  wt  ^  y,  are  satisfied  in  their 
respective  intervals. 

Case  II :  7,  <  7  —  1/a,  a,.  This  is  the  most  interesting  case.  The  vectors 
z  and  w  are  now  determined  as  follows: 

for  7  —  1/a,  a,  <,  t  <L.  7:  z,  =  0,  wt  =  0 

z,  =  *„  w,  =  yt 

(9)  for  Ty<,  t  <,T  —  1/a,  at:  z,  =  x,  —  x„  zt,  =  y, 

z,  =  x„  wt  =  y»  —  y, 

for  0  f  7, :  z,  =  0,  zt,  =  0 

z,  —  x,.  zt’,  =  y. 

For  7  —  1/a,  a,  <;  f  <  7  wt  have 

(10)  y»W  =  «,.  y,  (0  =  —  i  +  «,«*  (7  —  f) 

Hence,  in  this  interval  y,  (/)  <;  0  =  zt,.  Note  that  y,  (7'  —  1/a,  a,)  ==  0. 
In  the  range  7,  <  f  <;  7  —  1/a,  a„  we  have  the  equations 

(11)  dyl/dt  =  —atyt  +  (al  +  l)yl 
dyjdl  =  —  a,  y, 

Let  us  show  that  y,  ;>  0  and  y,  ^  y,  in  this  range.  Starting  from 
t  =  7  —  l/«,  a,  where  the  inequalities  are  satisfied,  let  us  reverse  the 
time.  The  backward  equations  arc 


(12) 


dyjdt  =  a,  y,  —  (1  +  a,)  y, 
dytjdi  =  a,  y. 


201 


BOTTLENECK  PROBLEMS 


From  this  we  obtain 


(13) 


djdt  (y,  —  y,)  =  (1  4-  ax)  y, 


Hence,  if  y,  remains  non-negative,  we  will  have  y,  —  y ,  0.  It  is  clear 

that  dyjdt  starts  out  positive  and  stays  positive  as  long  as  (y,,  yt)  remains 
above  at  y,  —  (1  -f-  a,)  y,  =  0.  If  it  hits  the  line  v.  e  have  dyx\dt  =  0,  which 
means  that  yx  has  a  maximum  or  a  point  of  inflection.  Both  are  excluded, 
since 


(14) 


d*  y. 


/fv. 


This  shows  that  wx  and  wt  remain  non-negative  in  this  interval. 

Finally,  for  t  <  T,  we  have 

(15)  dyxjdt  =  y„  dyjdt  =  —  a,  y. 

As  /  decreases,  y,  increases  and  y,  decreases.  Hence,  yt  ^  y,  remains 
valid. 

This  completes  the  verification. 

§  14.  Computational  solution 

The  problem  of  maximizing  xn,  where 

(1)  xk  + 1  =  a,,  x*  -(-  a„  y*  -f  6U  z*  -f  wk,  x0  =  cx , 

...  yt  *  1  —  «*1  **  -F  fl».  y*  +  btx  zk  4-  wk,  y0  —  cx , 

over  sequences  { z*}  and  {i£>*}  subject  to  constraints  of  the  form 

(2)  dtx  Zk  -I-  dt,  Wk  <,  di  3  Xk  +  dlAyk,i  =  1,2,  ....  M, 


may  be  reduced,  as  we  know,  to  the  computation  of  the  sequence 
{/*  (c„  Cj)},  k  —  1,2 . N,  where 

(3)  f\  (cx,  ct)  —  c„ 

/at  + 1  (c„  ct)  =  Max  [/a-  (au  c,  +  ait  ct  -f  bxx  z  4  blt  w, 
n 

«ti  Ci  4-  atl  c,  4-  blx  2  4-  btt  «/)] , 

where  R  is  the  region  defined  by 

(4)  d,x  z  4-  ditw  <.  dt3  cx  4-  dtt  c2,  i  —  1,2,  . . .,  M. 


Although  it  is  not  difficult  to  show  that  the  maximum  value  is  attained 
at  a  vertex  of  the  region  defined  by  (4),  an  exercise  we  recommend  to  the 
reader,  which  means  that  the  maximization  at  each  stage  is  trivial  com¬ 
putationally,  we  are  still  faced  with  the  problem  of  the  tabulation  of  a 
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sequence  of  functions  of  two  variables.  What  seems  to  make  the  problem 
particularly  onerous  in  this  case  is  the  fact  that  we  have  a  possibly  increa¬ 
sing  grid  in  the  ( clt  c t)  plane.  In  other  words,  if  we  wish  to  compute 
fs  (cu  ct )  in  the  region  0  c,  <;  ct,  0  <,  c,  <,  ct,  we  may  have  to  calculate 
/at  -  i  in  a  larger  region,  fs  -  i  in  a  still  larger  region  and  so  on. 

It  is  clear  that  whenever  a  situation  of  this  type  arises,  we  have  a  very 
costly  and  time-consuming  computation. 

Let  us  now  show  that  we  can  simultaneously  reduce  the  computation 
of  the  sequence  {fs  {cx,  c,)}  to  the  computation  of  two  sequences  of  func¬ 
tions  of  one  variable,  and  to  the  case  where  we  have  a  fixed  grid. 

Our  basic  tool  is  the  following  homogeneity  property  of  fs  (c„  ct), 

(5)  fs  (cx,  ct)  =  ctfs  (1,  Cj/cj)  , 

=  c,/n(c,/c,,  1). 

for  c,,  c,  >  0. 

We  may  thus  write  (3)  in  the  form 

(6)  fs  +i  (c„  c»)  =  Max  ^(an  cx  -f  axt  c,  +  bxx  z  +  bxt  w) 

.  /j  fl»i  ci  +  att  c»  +  ^ii  J  wYl 

JN  \  '  a„  c,  +  a,*  c» '+  bxx  z  +  &i,  w)\ 

=  Max  (a„  c,  -f  att  c,  +  +  btl  w) 

n  L 

/  an  c»  +  a„  c,  +  bxxz  +  blt  w  \1 

-  ■/‘‘  \  atl  c,  +  a„  Cj  +  btx  z  -f  btt  w  '  )\ 

We  see  then  that  the  calculation  of  fs  +i  (c„  c%)  can  be  effected  if  we 
know  the  two  functions 

(7)  gs  (x)  =fs(x,  1),  O^x^  1, 
hN  (x)  =  fs  (1,  x),  0  ^  x  ^  1 . 

Hence  the  computation  of  the  sequence  {fs  (clF  ct)}  may  be  reduced  to  the 
computation  of  the  two  sequences  {g*  (x)},  {hs  (x)}. 

§  15.  Nonlinear  problems 

A  variety  of  problems  in  analysis,  and  in  applications  to  control  prob¬ 
lems  arising  in  engineering  and  mathematical  economics,  reduce  to  the 
maximization  or  minimization  of  an  integral  of  the  form 

rr 

(1)  /  (z)  =  I  F  tx„  x, . x„ ;  z„  z . .  zm)  dt , 

over  all  functions  z«  ( t ) 
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subject  to  a  number  of  constraints  of  the  form 
(2)  (a)  dx(/dl  =  Gi  (*,  z),  i  =  1,  . . k 

(b)  R,  ( x ,  z)  ^  0,  /  =  1,2,.. ./. 

In  some  cases,  the  nonlinearity  leads  to  a  more  complete  analysis,  since 
it  permits  us  to  determine  the  extremal  by  classical  variational  techniques, 
rather  than  test  vertices  as  we  must  do  in  linear  problems.  In  cases  where 
constraints  of  the  type  above  enter,  we  must  combine  the  two  approaches. 
In  either  situation,  the  functional  equation  technique  may  be  utilized  for 
both  analytic  and  numerical  purposes. 

Problems  of  this  type  will  be  discussed  in  Chapter  9. 


Exercises  and  Research  Problems  for  Chapter  VI 

1.  Consider  the  problem  of  maximizing  the  linear  form 

n 

L  (x)  =  £  bi  Xi,  subject  to  the  constraint  x\  2>  0  and 

i-i 

n 

H  aij  xj  <;  d,  i  =  1,2,  ....  M,  where  we  assume  that  the  coefficients 
i  -  i 

an  and  are  positive.  Let 


fn  (c„  c,  ....  cm)  —  Max  L  (x) . 

{*>} 

Show  that 

/i  (ct,  ct,  ....  cm)  =  6,  Min  Ci/a(l, 

t 

fn  + 1  (c„  c„  .  .  . ,  Cm)  =  Max  [&„  + 1  x 

J 

■f-  fn  (Cj  -  din  +  1  X,  .  .  . ,  CM  - OMn  + 1  x)]  , 

where  0  <,  x  <,  Min  [c</am  + 1] . 

i 

2.  Show  that  fn  (c\,  c2,  ...  ,c„)  is  a  concare  function  of  the  c<  for  Ct  =  0. 

3.  What  conclusion  can  be  drawn  from  this  result  concerning  the  number 
of  the  maximizing  x\  which  are  non-zero  ? 

4.  Consider  the  above  problem  for  the  case  where  M  =  1,  2,  or  3,  and 
determine  the  dependence  of  the  maximizing  xt  upon  the  parameters  Cj, 
and  the  analytic  form  of  fn- 

5.  Show  that  the  tabulation  of  the  function  fn  (c,,  c„  ....  cm)  can  alwa 
be  reduced  to  the  tabulation  of  the  function  fn  (c,,  c„  ....  1).  Establish 
the.  corresponding  result  for  the  bottleneck  process  discussed  above. 
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6.  Consider  the  problem  of  maximizing  u  ( T )  where 

dufdt  —  uu  -f-  v,  u  (0)  =  c, 

over  all  function  v  ( t )  satisfying  the  constraint  0  v  <;  u  for  0  t  <,  T. 
Here  all  the  quantities  involved  are  scalars. 

7.  Solve  the  general  problem  of  maximizing  (x  (T),  a)  where 

dxfdt  =  Ax  +  By,  x  (0)  =  c, 

over  all  vectors  y  (t)  satisfying  the  constraint  0  y  ■<,  x  for  0  <,  t  <,  T. 
Here  x,  y,  c  and  a  are  vectors,  while  A  and  B  are  matrices. 

8.  Show  that  the  problem  of  maximizing  x,  ( T )  under  the  conditions 

(a)  dxxfdt  =  a,  z,  —  z„  x,  (0)  =  ct , 
dxjdt  =  yx  z3  —  yt  Zt,  xa  (0)  =  c, , 

where  z,  and  z,  are  functions  of  t  subject  to  the  restraints 

(b)  1.  ZxA-  Zx<.  x„ 

2.  Yt  zt  +  ytZx^  xa, 

3.  zt,  Zj  ^  0, 


is  equivalent  to  solving  the  partial  differential  equation 


df 

«  f- 


Max 

DU) 


{at 


dc. 


Yt 


df\ 

6cJZ-  +  b’y‘ 


;/ 

dc3 


where  D  (z)  is  the  region  determined  by  (b),  under  appropriate  assump¬ 
tions  of  continuity. 

All  parameters  appearing  are  assumed  to  be  non-negative  and  /  = 
/(c..  c„  t). 


9.  Show  that  optimal  policies  depend  only  upon  the  ratio  r  =  c,/c3,  or 
Xj/x2,  and  T  the  ‘ime  remaining. 


10.  Determine  the  form  of  the  solution  for  small  T. 


11.  Solve  the  problem  in  the  special  case  where  b3  =  0. 


Bibliography  and  Comments  for  Chapter  VI 

§  1.  A  discussion  of  the  theory  of  linear  programming  may  be  found  in 
Activity  Analysis  of  Production  and  Allocation,  lid i ted  by  T.  C.  hoopmans, 
Cowles  Commission,  U.  of  Chicago,  1951,  where  there  is  an  account  of  the 
‘‘simplex"  technique  of  G.  Dantzig,  and  a  number  of  applications.  An 
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account  of  an  iterative  technique  of  a  different  type,  the  “flooding”  technique 
of  A.  Boldyreff  may  be  found  in  A.  Boldyreff,  Determination  of  the  Maximal 
Steady  Flow  of  Traffic  Through  a  Railroad  Network,  RAND  Corporation, 
P-687,  1955.  Both  of  these  are  “relaxation  techniques”  of  the  kind  brought 
into  prominence  by  R.  V.  Southwell. 

§  5.  The  methods  and  results  of  this  and  the  following  section  were 
announced  in  R.  Bellman,  “  Bottleneck  Problems  and  Dynamic  Programming," 
Proc.  Nat.  Acad.  Set.,  vol.  39  (1953),  and  presented  in  detail  by  R  Bellman, 
“Bottleneck  Problems,  Functional  Equations  and  Dynamic  Programming,” 
Econometrica,  vol.  23  (1955),  pp.  73-87. 

§  9.  A  rigorous  theory  of  these  variational  problems  will  involve  at  least 
Lebesgue-Stieltjes  integrals  and,  most  likely,  the  theory  of  distributions  of 
L.  Schwarz.  It  may  well  be  that  this  will  serve  as  a  motivation  for  the  study 
Of  variational  problems  involving  distributions. 

§  12.  As  in  the  discrete  case,  the  dual  problem  is  most  logically  discussed  by 
treating  the  min-max  problem  containing  both  the  original  and  the  dual 
process.  A  number  of  results  can  be  established  concerning  the  existence 
of  a  value  of  the  corresponding  game  and  the  equivalence,  min-max  = 
max-min,  using  existing  results  in  the  theory  of  continuous  games,  in  the 
case  where  the  policy  functions  are  uniformly  bounded  as  a  consequence  of 
the  constraints.  The  general  case  however,  awaits  a  theory  of  games  over  the 
space  of  the  distribution  functions  of  L.  Schwarz. 

It  is  remarkable  that  so  much  can  be  obtained  using  only  the  easily  derived 
result  of  (12.12). 

§  13.  R.  S.  Lehman  has  found  a  continuous  version  of  the  "simplex" 
method  of  Dantzig  which  can  be  used  to  obtain  the  solutions  of  variational 
problems  of  this  type  in  a  systematic  fashion.  A  preliminary  account  of  his 
results  may  be  found  in  R.  S  Lehman,  “On  the  Continuous  Simplex  Method” 
KM— 1386,  RAND  Corporation,  1954. 
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Bottleneck  Problems:  Examples 

§  1.  Introduction 

In  the  previous  chapter,  we  discussed  a  multi-stage  production  process 
involving  three  industries,  which  we  called  the  auto,  steel,  and  tool  indus¬ 
tries.  Taking  this  problem  as  our  motivation,  we  were  led  to  a  general 
theoretical  formulation  of  a  class  of  continuous  multi-stage  production 
processes  in  terms  of  the  concepts  and  techniques  of  the  theory  of  dy¬ 
namic  programming. 

The  purpose  of  the  present  chapter  is  to  show  by  grinding  through  the 
details  of  a  particular  example  that  this  new  approach  may  be  utilized 
to  provide  explicit  analytic  solutions  of  problems  of  this  general  type. 
The  analysis  is  decidedly  difficult  and  it  cannot  be  said  that  these  prob¬ 
lems  have  in  any  sense  been  tamed. 

We  shall  consider  a  lumped  two-industry  process,  involving  what  we 
call  the  auto  and  steel  industries.  The  high  degree  of  lumping  (or  more 
pedantically  “conglomeration”)  is  indicated  by  the  fact  that  at  any  time 
t  we  assume  that  the  state  of  the  industrial  system  is  completely  specified 
by  the  following  quantities: 

(1)  at,  (t)  —  auto  stockpile  at  time  t 
xt  (t)  —  auto  capacity  at  time  t 

(t)  —  steel  stockpile  at  time  t 
,v4  (t)  —  steel  capacity  at  time  t 

Taking  t  to  be  a  continuous  variable,  at  each  instant  we  must  deter¬ 
mine  rates  of  allocation  of  the  steel  stockpile  towards  three  distinct 
objectives: 

(2)  a.  Production  of  autos 

b.  Building  of  auto  factories, 
i.e.,  increase  of  auto  capacity 

c.  Building  of  steel  mills, 

i.e.,  increase  of  steel  capacity 
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The  last  two  of  these  three  objectives  are  to  be  sublimated  to  the  pri¬ 
mary  objective  of  maximizing  the  total  number  of  autos  produced  over  a 
time-period  T,  which  is  to  say,  the  quantity  xx  ( T ). 

The  basic  assumptions  of  our  model  are  the  following:  The  measures 
of  stockpile  and  capacity  are  chosen  so  that  one  unit  of  capacity,  either 
auto  or  steel,  is  required  for  the  production  of  one  unit  of  stockpile  in 
unit  time.  We  assume  that  bx  units  of  steel  are  required  to  make  one  unit 
of  autos,  6,  units  of  steel  are  required  to  increase  auto  capacity  by  one 
unit,  and  bt  units  of  steel  are  required  to  increase  steel  capacity  by  one 
unit.  However,  we  shall  assume  that  no  steel  is  required  to  produce  addi¬ 
tional  steel. 

A  very  important  assumption  is  that  there  is  no  time-lag  between  allo¬ 
cation  and  increase  in  capacity  of  production.  The  problems  which  arise 
when  time-lag  is  considered  are  an  order  of  magnitude  more  difficult  and 
will  not  be  discussed  here. 

Let 


(3) 

(a) 

z,  (f)  =  rate 

(b) 

2,  ( t )  =  rate 

(c) 

z,  ( t )  =•  rate 

<d) 

z4  (t)  =  rate 

of  production  of  autos 
of  increase  of  auto  capacity 
of  production  of  steel 
of  increase  of  steel  capacity 


We  derive,  following  the  lines  of  the  argumentation  of  the  previous 
chapter,  the  following  system  of  equations 


(4)  dxjdt  =  z,  (f),  x,  (0)  =  cx 

dxt\dt  —  zt  (() ,  *»  (0)  ==  ct 

dxjdt  =  z,  (t)  —  6,  z,  (/)  —  6,  z,  (/)  —  bt  z4  (i),  (0)  =  c, 

dxjdt  =  zt  (t),  xt(0 )  =  c* 

where  the  z<  and  xx  are  subject  to  the  following  constraints 

(5)  (a)  z,  (l)  ^  at,  (t) 

(b)  z3  (()  ^  xt  ( t ) 

(c)  zt  (0  ^  0,  i  =  1,  2,  3,  4, 

(d)  x,  (/)  ^  0 


The  first  two  constraints  are  capacity  constraints,  i.e.,  limitations  of 
bottleneck  type;  the  third  is  a  statement  that  rates  of  production  must  be 
non-negative,  i.e.,  no  scrapping  or  “cannibalization,”  and  the  fourth 
asserts  that  the  steel  stockpile  must  be  non-negative,  i.e.,  no  borrowing. 

The  problem  is  now  to  determine  the  z<  (t),  satisfying  the  restrictions  of 
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(5),  which  maximize  (7*).  Because  of  the  lack  of  any  explicit  upper 
bound  on  zt  and  zt,  various  difficulties  arise  which  must  be  surmounted 
by  the  use  of  delta  functions. 

§  2.  Preliminaries 

In  §  1,  we  formulated  in  mathematical  terms  the  problem  of  utilizing 
the  steel  and  auto  industries  so  as  to  maximize  auto  production.  Let  us 
continue  from  equations  (1.4)  and  (1.5). 

The  equations  can  be  combined  to  provide. an  equivalent  system  of 
integral  inequalities: 

(1)  *1^*,:  *i(t)  —  I*  zt(s)  ds<.ct, 

0  ^  £  (—  zt  (s)  +  2t  (s)  +  bt  zt  (s)  +  bt  zt  (s))  ds<,ct, 

zt<,x 4 :  Zt  ( t )  —  £  zt  (s)  ds  ^  c4 

Our  problem  is  a  special  case  of  the  following  more  general  problem. 
Let  Z  be  the  set  of  all  vector  functions  z  (t)  which  satisfy  the  conditions 

(2)  (a)  z(t)^0 

(b)  Bz  (t)  +  J  Cz  ( s )  ds<,c 

where  B  and  C  are  matrices  and  c  is  a  constant  vector.  We  now  wish  to 
find  a  vector  function  z  ( t )  in  Z  which  maximizes 


J7  (z  (t).  a) 


This  is  the  problem  we  discussed  in  the  previous  chapter.  It  was  shown 
there  that  there  is  a  dual  problem  which  furnishes  1  sufficient  condition 
that  a  z  ( t )  belonging  to  Z  be  a  maximizing  vector,  or  in  other  words,  that 
a  feasible  solution  be  optimal. 

Let  W  be  the  set  of  vector  functions  w  (f)  for  which 

(4)  w  (t)  2>0 

B'  w  ( t )  4-  C'  J  w  (s)  ds  ;>  a 

where  B'  and  C'  are  the  transposes  of  B  and  C.  The  dual  problem  is  that 
of  finding  the  minimum  of  J  («' (t),  c)  dt,  for  w  i  W. 
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As  we  showed  in  §  1 1  of  Chapter  4,  we  have  for  all  z  and  w  in  the  respec¬ 
tive  classes  Z  and  \V,  the  inequality 

(5)  ( z  (/),  a)  dt  <.  (w  (/),  c)  dt 

If  we  can  find  two  vector  functions  z  and  w  for  which  (2.5)  holds  with 
equality,  they  must  yield  the  maximum  and  minimum,  respectively,  for 
the  two  problems.  Two  such  vector  functions  for  which  equality  holds 
will  be  said  to  be  paired  with  each  other.  Thus,  a  sufficient  condition  that 
a  z  belonging  to  Z  be  optimal  is  that  it  can  be  paired  with  some  w  in  W. 
For  the  auto-steel  problem  formulated  above  we  have 

(1  0  0  0  \  /  0  — 1  0  0  \  /  1  \ 
0  0  0  0  C=  4,  4,  —1  5,  )  a  =  I  0  I 

0010/  \00  0  —1/  \  0  / 

The  dual  system  of  inequalities  is  therefore 

(7)  li  =  i vt  (/)  +  6,  £  w3  (s)  ds  —  1  ^0 

lt  =  —  f  u’t  ( s )  ds  +  6,  J  w3  (s)  ds  ;>  0 

It  =  U’t  (/)  —  j  *  (s)  ds  ^  0 

rr  r T 

/«  =  bt  I  w3  (s)  ds  —  J  zt’4  (s)  ds  0. 

We  have  chosen  to  call  the  components  of  w,  wt,  w3  and  wt  in  order  to 
keep  the  connection  with  the  inequalities  z,  <,  xt,  0  x3,  z,  ^  xt  clear. 

The  optimality  conditions,  i.e.,  the  conditions  that  (2.5)  hold  with 
equality,  arc:  — 

(8)  If  z<  ( t )  >  0,  then  U  (t)  =  0,  (t  =  1,  2,  3,  4) 

If  z,  (t)  <  xt  (/),  then  wt  (t)  =  0 
If  0  <  x3  ( t ),  then  u  ,  (/)  =  0 
If  z,  ( t )  <  xK  ((),  then  w4  (/)  =  0 

The  following  are  equivalent  to  the  optimality  conditions: 

(9)  If/,  (t)  >  0  then  z,  (t)  =  0,  (t  =  1,  2,  3,  4) 

If  ie,  (t)  >  0,  then  z,  (/)  =  x3  (/) 

If  K',  (()  >  0,  then  0  =  x3  (/) 

If  ie 4  (/)  >  0,  then  z3  (/)  =  xt  (t) 
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§  3.  Delta-functions 

Before  we  proceed  to  determine  the  solution,  let  us  discuss  the  use  that 
we  will  make  of  delta  functions.  It  can  easily  happen  that  the  general 
problems  discussed  above  have  mo  solutions  if  the  sets  Z  and  W  are  com¬ 
posed  only  of  vectors  having  components  which  are  integrable  func¬ 
tions.  In  fact,  as  we  shall  later  see,  this  is  the  usual  case  in  the  auto-steel 
problem.  This  difficulty  can  be  evaded  by  enlarging  the  sets  Z.  and  W  so 
that  they  contain  vector  "functions”  whose  components  are  sums  of 
integrable  functions  and  "delta  functions."  In  these  enlarged  classes 
the  problems  have  solutions.  By  a  delta  function  concentrated  at  to  with 
weight  to,  which  we  denote  by  tod  (t  —  t0),  we  mean  an  improper  function 
such  that 


—  t„)  <p  (s)  ds 


Oif  t  <  to 

uxp  (t0)  if  t  >  to 


for  every  function  <p  continuous  at  t0.  (For  t  =  ta  the  integral  in  undefined 
except  when  qp  (t0)  =  0,  in  which  case  it  is  defined  to  be  0.) 

The  use  of  delta  functions  can  be  justified  rigorously  either  by  the 
alternative  use  of  Stieltjes  integrals,  or  by  regarding  the  delta  functions 
as  obtained  by  completing  the  space  of  integrable  functions  by  a  process 
similar  to  that  used  in  obtaining  the  real  numbers  from  the  rationals. 

The  optimality  conditions  remain  the  same  even  when  Z  and  IF  are 
enlarged  in  the  above  way.  We  observe  that  there  is  no  harm  in  the  viola¬ 
tion  of  the  optimality  conditions  at  isolated  points  or  even  in  sets  of 
measure  zero  when  only  measurable  functions  are  allowed  as  components 
of  z  and  w.  But,  when  one  of  the  vectors,  w,  for  example,  has  a  component 
Wi  which  is  a  delta  function  at  the  point  l0,  then  for  a  z  to  be  paired  with  w, 
the  corresponding  optimality  conditions  must  be  satisfied  at  the  point  t0. 

We  shall  find  that  we  never  have  to  use  delta  functions  concentrated 
at  any  point  other  than  0  to  obtain  an  optimal  z.  Intuitively,  this  means 
that  discontinuous  changes  are  not  necessary  except  at  the  beginning. 

§  4.  The  solution 

The  procedure  that  we  use  will  be  to  construct  a  number  of  ^-solutions 
which  we  can  pair  with  z’s  belonging  to  Z  and  hence  obtain  solutions  of 
our  problem.  The  chief  difficulty  occurs  in  constructing  ze-solutions  with 
suitable  properties.  In  this  we  are  guided  by  a  combination  of  guesswork 
and  observation  of  properties  that  an  optimal  z  should  have.  Guesswork 
could  be  ebminated  at  the  expense  of  considering  a  very  much  larger 
number  of  cases. 

First  of  all,  it  is  clear  that  we  should  always  have  z3  —  x4.  To  produce 
too  much  steel  is  not  harmful.  This  tells  us  that  we  should  have  l3  ( t ) 
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=  0  for  all  (;  i.e.,  w4  (i)  —  f  wt  (s)  ds.  The  remaining  inequalities  of 
(2.7)  then  become 

(1)  /,  =  wt  (/)  +  6,  w4  (t)  —  1  ^  0 
/,  =  bt  wA  (<)  —  j  *  w,  (s)  ds  ^  0 

lt  =  bt  w4  (t)  —  jT  wt  (s)  ds  ^  0 

Shortly  before  T  it  is  clear  that  we  should  be  producing  autos  since  xt 
is  the  quantity  we  wish  to  maximize  at  time  T.  Hence,  we  will  have  z, 
>  0,  which  implies  that  /,  =  0.  This  alone  will  not  give  us  sufficient 
information  to  determine  wt  and  wt. 

We  first  construct  a  w  solution,  which  we  shall  call  the  basic  w-solution, 
with  the  property  that  /,  =  0  near  the  end.  This  means  that  we  must  have 
wt  ( T )  =  0.  Then  by  (4.1)  we  have 

(2)  w4  ( t )  =  ^  (1  —  z-ft.  W-O/M 
wt  (/)  =  e-MT-O/o, 

We  see  that  wt,  wt,  and  wt  all  remain  positive  as  t  decreases.  We  must 
check  to  see  whether  the  inequality  lt  ;>  0  is  satisfied.  With  the  above 
choice  of  w  we  have 

bt  (T  —  /)  b. 

(3)  l,  =  -4  (1  _c-Mr-o/i>,)  _  1— - '  +  _l(i 

b  i  b  i  6,* 

The  quantity  on  the  right  side  of  this  equation  is  positive  for  T  —  t  small 
but  is  negative  when  T  —  t  is  large.  Let  t0  be  the  value  of  t  for  which  the 
right  side  becomes  zero.  Then  T  —  /„  is  the  solution  of  the  equation 

(4)  T  —  t0  =  -f  b~j  (1  —  e~»,  (r-ij/f.) 

Thus  we  see  that  at  t0  we  must  abandon  one  of  the  equations  l4  =  0 
and  /,  =  0.  Let  us  try  to  choose  w  so  that  /,  =  0  and  l4  =  0  before  t0.  We 
have 

(5)  wt  (()  —  wt  (i0) 

Wt  (()  =  1  -  Wt  [to) 

To  verify  that  ^  0  we  compute  its  derivative.  We  find 

dly  dw 4  /  bt\ 

(6)  —  =  bt  —  +  w,  =  1  —  \frl  +  u’*  (/o) 
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0  is  that  dl,/dt  ■ 


0  for  all  t<,t0 


(7) 


W'*  (to) 


bt  -f-  6,  bt 

which  by  (4.2)  and  (4.4)  is  equivalent  to  T  —  t0  ^  bt.  This  last  inequality 
can  be  checked  by  putting  bt  in  place  of  T  —  t  in  (4.3)  and  verifying  that 
the  quantity  thus  obtained  is  positive.  We  have 


(8) 


s;  [(** + If) (I — ] 

=  ~  [e —  (l  +  >  0. 


Hence  /,  ;>  0  for  all  t  <;  t0  with  the  above  choice  of  w. 

We  also  see  from  (4.5)  that  wt  and  w2  remain  positive.  Thus  (4.5)  will 
give  a  satisfactory  choice  of  w  until  wt  becomes  zero.  Let  be  the  value 
of  t  when  this  happens.  Then,  by  (4.2)  and  (4.4)  we  have 


(9) 


u  t  Mb  b*  +  b%/bi 

(T-U) 


Before  f,  let  us  see  whether  we  can  choose  wt  —  0  and  have  l,  —  0.  We 
see  that  wt  >  0  and  wt  >  0.  We  have  dljdt  —  bt  dwjdt  <  0  so  that 
/,  >  0,  and  dl^dt  =  6,  dwjdt  <  0  so  that  /,  >  0.  Hence  this  choice  of  w 
will  be  valid  for  all  t  <,  f,. 

Our  basic  solution  is  summarized  in  the  following  table.  This  table  also 
lists  the  properties  that  a  z  paired  with  this  w  solution  must  have.  Any  z 
with  these  properties  gives  a  policy  which,  if  feasible  (i.e.,  satisfies  the  z 
constraints),  is  optimal. 


t  <  /, 

1 1  < 

<  t0 

to  < 

t  <  T 

/,  >  0 

II 

O 

l,  =  <> 

/,  =  0 

l2  >  0 

j  =  0 

/,  >  0 

z2  =  0 

/,  =  0 

/.  =  0 

/,  =  0 

U  =  o 

U  =  o 

- 

to  =  0 

lt  >  0 

zt  =  0 

a.,  =  0 

wt  >  0 

Zl  —  X2 

w2  >  0 

*1  =  *3 

«’*  >  0 

*3  =  0 

in,  >  0 

*3  =  0 

o>,  >  0 

*j  =  0 

w,  >  0 

=  X4 

wt  >  0 

*3  =  *0 

wt  >  0 

Z3  —  X\ 

Figure  L 


Let  us  see  how  this  table  can  be  used  to  obtain  a  partial  solution  of  the 
auto-steel  problem.  For  the  moment  let  us  assume  c3  =  0.  For  t  <  t,  we 
must  have 
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(11)  z i  =  0,  zt  =  0,  zt  =  x„  z4.  =  xjb4. 

For  (,  <  /  <  t0  we  must  choose 

/im  A  xt  —  byxt 

(12)  r,  =  xt,  zt  =  0,  z,  =  x4,  z4  = - t - . 

o« 

This  can  be  done  if  and  only  if  x4  (*,)  —  6,  xt  (/,)  2>  0.  Let  us  assume  that 
this  inequality  is  satisfied.  Then  for  t0  <  t  <  T  we  must  have 

*«  —  *i 

6, 


(13) 


=  *».  z«  = 


,  z,  =  x4,  z4  =  0 


which  is  possible  provided  xt  (f,)  —  6,  x,  (/,)  ^  0.  Thus  we  see  that  for 
certain  initial  conditions  we  can  obtain  the  optimal  solution. 


§  5.  The  modified  w  solution 

As  already  has  been  noted,  we  run  into  trouble  if  x4  (f,)  —  6,  xt  (/,) 
<  0.  To  handle  this  case  we  consider  a  modification  of  the  basic  w  solu¬ 
tion  of  Fig.  1  above.  Let  n0  be  in  the  interval  [f„  T].  For  each  such  u0 
we  define  a  solution  as  follows: 

For  u0  <  t  <  T  we  let  w  ( t )  be  the  same  as  in  the  basic  solution.  For 
t  <  u0  we  choose  u1,  (/)  ==  0.  For  t  <  u0  but  near  u0  we  choose  ivA  (t) 
=  1/6,  co  that  /,  =  0.  This  choice  will  keep  /4  >  0  for  a  while  before  n0. 
We  define  n,  to  be  the  point  where  /4  becomes  0  with  this  choice  of  w. 
For  t  <  m,  we  choose  w  so  that  /4  =  0.  It  is  easily  seen  that  this  choice 
makes  /,  >  0,  /,  >  0,  w3  >  0  and  ut  >  0  for  all  t  <  Hence,  in  this 
way  we  obtain  a  w  solution  for  each  u0  in  the  interval  [/,,  T].  We  observe 
that  for  ii0  —  lt,  «,  =  and  this  solution  is  identical  with  our  basic 
solution  of  Fig.  1.  Note  that  «,  depends  continuously  on  Since  for  u0 
=  T,  «,  —  T  —  bt,  there  is  a  w  solution  for  each  «,  in  the  interval 
[tt,  T  —  b,\. 

These  w  solutions  together  with  the  properties  of  the  corresponding  z 
solutions,  are  summarized  in  the  following  table: 


/< 

Ml 

“i  • 

t<  u 

0  Uq 

<  T 

to  M«< 

t  <T 

t„<t<T 

/, 

•  0 

*1 

tl 

/, 

0 

/,  * 

II 

/, 

=  « 

■  0 

0 

/, 

*i 

II  /,  > 

II 

•1  * 

0  /, 

=  0 

t% 

0 

/. 

0 

/, 

0 

/, 

=  o 

/4 

0 

/, 

0 

r* 

II  /,  = 

0 

/, 

*«  -  0 

ir, 

0 

ut 

II 

U',  > 

II 

1 

a,  u, 

>  II 

z,  -  ar. 

tt| 

0 

II 

n. 

II 

f(*l 

II 

*»  - 

II  M( 

X,  0 

tt'i 

0 

*1 

*« 

M’l 

o 

r,  b  * 

II 

*1 

r4  u4 

-» 

Since  u 

a  delta  function 

at  m„,  we 

must  have 

b  (w«) 

II 

Figure  2 
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Note  that  if  «,  >  to,  then  there  is  no  l  satisfying  the  conditions  of  the 
third  column ;  and  if  u«  =  T  then  there  is  no  t  satisfying  the  conditions  of 
the  last  column  either. 


§  6.  The  equilibrium  solution 

A  policy  which  seems  plausible  in  some  instances  is  the  following: 
Make  an  initial  adjustment  to  bring  x3  down  to  zero  in  such  a  way  that 
after  the  adjustment  xA  =  6,  xt.  If  this  is  done,  after  the  initial  adjust¬ 
ment  no  increase  in  capacities  is  necessary  and  all  available  steel  can  be 
used  for  auto  production.  Such  a  policy  would  require  for  the  w  paired 
with  it  that  /,  (0)  =  0  and  lA  (0)  =  0,  because  in  general  both  zt  and  z« 
will  have  to  be  delta  functions.  We  shall  construct  a  w  solution  with  this 
property. 

First,  we  note  that  our  basic  w  solution  has  this  property  when  T  is 
such  that  t0  =  0.  This  suggests  that  we  try  to  choose 

(1)  wA  (/)  =  ae~b‘  (r -<)/(>,  _j_  p 

where  a  and  /?  arc  constants.  If  wt  is  chosen  so  that  l3  =  0,  the  inequalities 
(4.1)  become 

(2)  lt  =  b3  w,  ( t )  —  (T —  /)  -T  wA  (s)  ds  ^  0 

=  bt  w ,  (<)  —  wt  (s)  ds^O. 

If  it  (0)  =  lA  (0)  =  0,  then 


K'«  (0)  = 


bt  +  6,  bt  ' 


We  set  E  =  e  biT!1’ ■  and  from  (6.1)  — (6.3)  derive  the  following  two 
equations  for  a  and  ft: 

(4)  bt  a  -f-  ( bz  -f-  bi  T)  ft  =  7 

(6.  +  6,*«)  £a  +  (6t  +  6,  bA)  ft  ~  T . 

A  solution  of  these  equations  will  give  a  w  for  which  lt  (0)  =  /4  (0)  =  0. 
Wre  have 

(5)  a  =  T  1  bt  +  b,  T  =  T  [i b t  (bt  —  T)] 

A  1  bt  +  bA  bt  \  A 

P  ~  T  '  bt  1  =  T  [bt  (1  —  E)  -  bx  bA  E] 

A  |  (6,  -p  bA  bA)  E  1  A 

where 
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.(6)  A  = 
Also 


b%  bt  -f-  6,  T 

(bt  -f-  6,  b4)  E  bt  +  bt  b4 


—  (bf  +  bt  b4)  (bt  —  bf  E  —  5,  ET) 


(7)  A  =  (6,  +  bx  b4)  E  (bt  c»>  —  bt  —  bt  T) 

>  (bt  +  bx  b4)  E  (bt  +  6,  T  —  bt  —  6,  T)  =  0 

Now  let  us  assume  that  T  — 10  i>  T  ;>  b4.  Then  from  (6.5)  we  see  that 
a<.  0.  Let  us  check  to  be  sure  that  for  the  w  we  have  defined  wt  (t), 
w4  (t),  and  w4  (t)  are  non-negative  for  0  <£  t  <,  T.  This  is  equivalent  to 
verifying  that  0  <;  w4  (t)  <.  llbx  and  dw4jdt  <[  0.  We  have  dw4jdt  = 
a  bt/bt  e Mi*-*)*.  <;  0.  Hence  it  will  be  sufficient  to  check  that  w4  (T) 
^  0  and  w4  (0)  <;  1/5,.  Since  T  —  t07>T,  we  conclude  from  (4.4)  and 
(6.3)  that 


(8) 

and 


w4  (0)  = 


bt  -J-  6,  b4 


T  —  to  b4  -|-  bxfbx  ^  ^ 
bt  +  bt  b4  <  b,  ~+bt  b,  =  1/61 


(T)  =  a  +  0  =  -j  5,  [(64  +  64/6,)  (1  —  E)  —  T]  ^  0 


We  also  must  check  that  for  0  <;  t  <,  T,  lt  ;>  0  and  l4  ;>  0. 
Since 

(10)  dlt/dt  —  bt  dwjdt  -)-  1  —  bxw4(l)  —  1  —  6,  /? 


and  /,  (T)  =  bt  w4  ( T )  ;>  0,  we  have  /,  ;>  0  for  all  t  in  [0,  T].  Similarly, 
we  know  that  l4  (T)  =  6t  (T)  2>  0.  Hence,  if  we  show  that  d'ljdt*  <;  0, 
we  will  have  proved  that  l4  ;>  0  for  all  <  in  [0,  T].  We  have 


(11) 


rf*  /4 
dt* 


^0. 


This  completes  the  proof  that  the  w  which  we  have  defined  is  a  solution. 
Its  properties,  together  with  those  a  -z  paired  with  it  must  have,  are  sum¬ 
marized  in  the  following  table : 
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t  =  0 

0  <  t  <  T 

Z,  =  0 

Z,  =  0 

z.  >  o 

Z,  =  0 

zt  =  0 

Z4  =  0 

Z4  >  0 

zt  =  0 

wt  >  0 

*i  = 

w9  >  0 

xt  —  0 

wA  >  0 

xt  =  x4 

Note .  This  solution  is  valid  only 

for  T  >  bt. 

Figure  3 


§  7.  A  short-time  w  solution 

The  w  solution  which  we  construct  next  will  be  useful  in  finding  the 
solution  of  our  maximum  problem  when  the  total  time  is  short,  T  <  64. 
This  solution  differs  from  those  already  constructed  in  that  it  allows  xt 
to  be  positive  and  z,  to  be  a  delta  function  concentrated  at  0. 

For  0  Z  <;  T  let  wt  ( t )  =  y,  wt  (Z)  =  1  —  6,  y  where  0  <  y  <  1/6,. 
Then  Z,  (Z)  =  0,  Z4  (Z)  >  0  for  0  <.  t  <  T.  Also 


(1)  Mz)  =  Aty-(r  — Z)(l-6iy)  =[&.+  Mr  — Z)]y  — (T-Z) 

T 

Now,  if  we  choose  y  —  - —  then  lt  (0)  =  0  and  Z,  (Z)  >  0  for  Z  >  0. 

6,  4-  o,  1 

Thus  we  obtain  a  solution  of  the  system  of  inequalities  (4.1).  It  is  sum¬ 
marized  below  together  with  the  properties  a  z  paired  with  it  must  have. 


t  —  0 

0  <1  <T 

Z,  -  0 

Z,  =  0 

Z,  >  0 

Z,  -  0 

Z4  >0 

it',  >  0 
li>,  =  0 
lt’4  >  0 

-  0 

l4  -  0 

/,  -  -r, 

*t  =  ^4 

Since  wt  is  a  delta  function  at 

T,  x,(T)  =  0. 

Note .  This  solution  is  valid  for 
T  <  b,  only. 

_ _ _ 1 

Figure  4 
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§  8.  Description  of  solution  and  proof 

We  now  can  give  the  complete  solution  to  the  original  problem.  There 
are  quite  a  few  cases  that  we  must  consider  separately.  The  critical  values 
to  and  which  are  defined  by  (4.4)  and  (4.9),  depend  on  T,  but  in  such  a 
way  that  for  fixed  bx,  bt  and  bA,  T  —  t0  and  T  —  tx  are  constants. 

Case  I :  T  is  large  enough  so  that  tx  ^  0.  In  this  case  we  choose  zA  to  be  a 
delta  function  concentrated  at  0  to  bring  xt  down  to  zero  immediately. 
This  means  that  if  the  total  time  is  long  enough  we  should  not  keep  any 
steel  in  storage  but  should  be  using  it  to  build  more  steel  plants.  The  use 
of  the  delta  function  is  permissible  because  lA  =  0  for  t  near  0.  For 
0  <  t  <  tx  we  let 

(1)  zx  —  0,  zt  =  0,  zx  =  x4,  zA  =  xA/bA 

thus  keeping  xx  at  zero  level.  At  tx  we  must  distinguish  different  subcases: 

(2)  IA:  xA  ftj)  —  bx  xx  (tx)  ^  0 
IB :  xA  (tx)  bx  xx  (/,)  <  0 

In  case  IA  we  can  produce  autos  at  capacity  without  running  out  of 
steel.  Hence  we  let 

*4  —  £“i  x» 

(3)  2,  =  *„  z,  =  0,  at,  =  x4>  zA  =  - - - 

for  tx  <;  t  <,  t0\  and  for  t0  <  t  T  we  let 


(4) 


*«  —  bx  xt 

Zl  =  xt ,2,=  -  ,  -  ,  2,  =  XA,  ZA  =  0. 

O. 


This  solution  for  Case  IA  is  optimal  because  it  can  be  paired  with  our 
basic  w  solution  of  Fig.  1. 

In  Case  IB  we  do  not  have  enough  steel  to  produce  autos  at  capacity. 
Hence  we  continue  to  produce  no  autos  for  t  >  tx,  i.e., 


We  do  this  until  xA  —  bxjct  becomes  zero  or  t  —T  —  bA,  whichever 
happens  first.  If  xA  —  bx  xx  becomes  zero  at  V  then  we  choose  zx  —  xt,  zx 
=  0 ,  z3  —  xt,  zA  —  0  thereafter.  This  solution  is  seen  to  be  optimal  by 
pairing  it  with  the  w  solution  of  Fig.  2  for  which  ux  =  t' .  As  we  have  a* 
ready  remarked  there  is  such  a  solution  no  matter  what  t'  is,  so  long  as 
tx  <  t'  <,T  —  bA.  If,  on  the  other  hand,  xA  (T  —  bA)  —  bx  xx  (T  —  bA)  <  0, 
then  for  T  —  bA  <  t  <;  T  we  choose 
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This  solution  can  be  seen  to  be  optimal  by  pairing  it  with  the  w  solution 
of  Fig.  2,  for  which  ua  =  T,  ut  —  T  — 

Case  II :  T  is  such  that  f,  <,  0  <,  to.  As  before  we  choose  to  be  a  delta 
function  concentrated  at  0  to  bring  xt  down  to  zero  immediately.  There¬ 
after  the  solution  is  as  before.  There  are  two  subcases : 

(7)  IIA:  *.(())—&,  *,(0)2>0 

IIB:  *«  (0)  —  by  z,  (0)  <  0 


In  Case  1IA  we  let  Zy  =  xt,  i.e.,  produce  autos  at  capacity.  We  use  the 
remaining  steel  to  increase  steel  capacity  before  t0  and  to  increase  auto 
capacity  after  t0.  That  is,  for  0  <  t  <  t0  we  let 


(8) 


*4  —  by  xt 

•*i  =  at,,  zt  =  0,  z,  =  xt.  z4  = 

o« 


and  for  t  >  t0  we  let 


(9) 


*4  —  &I  *« 

Zi  =  xt,  z,  =  - - - ,  Z>  =  xt,  z4  =  0. 

0« 


This  solution  is  optimal  because  it  can  be  paired  with  our  basic  solution 
of  Fig.  1. 

Case  IIB  is  similar  to  IB.  The  same  prescription  holds,  and  the  solution 
is  paired  with  one  from  Fig.  2. 

Case  III :  T  is  such  that  t0<,Q  <.T  —  bt.  There  are  three  subcases: 


IIIA : 


c4  6,  ct  2>  by 


6. 


(10) 


IIIB:  c4  —  byCt<  —  ~ 

b  4 


me. 


b i  cs 

~7  ^  C4  by  ct  •<  —7  . 

b  4  o* 


In  Case  IIIA  we  use  our  initial  stockpile  of  steel  to  increase  auto  capa¬ 
city,  i.e.,  we  let  z,  be  a  delta  function  concentrated  at  0  bringing  xs  down 
to  zero.  Thereafter,  we  let  Zy  =  xt  and  use  any  remaining  steel  to  increase 
auto  capacity,  i.e., 


(11) 


Xy  -  by  Xy 

Zl  =  *1,  Zt  =  -  7 - .  2f  =  *4,  Zt  =  0. 

bt 


This  solution  is  optimal  because  it  can  be  paired  with  the  basic  w  solution 
of  Fig.  1. 
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In  Case  IIIB  we  find  ourselves  short  on  steel  capacity.  The  policy  and 
proof  are  the  same  as  in  Case  IB. 

In  Case  IIIC  we  can  make  an  initial  adjustment  so  that  xt  becomes  zero 
and  xt  =  ft,  xt.  We  do  this  by  choosing  zt  and  z4  to  be  delta  functions 
concentrated  at  0.  After  that  we  let  z,  =  xt,  z*  =  0,  z,  =  x4,  z4  =  0.  This 
solution  is  optimal  because  it  can  be  paired  with  the  equilibrium  w  solu¬ 
tion  of  Fig.  3. 

Case  IVj  T  <,  bt.  There  are  three  subcases  which  depend  on  the  initial 
values : 


IVA: 

,  C* 
CA  -  ^1  Ct  2^  . 

o% 

(12) 

IVB: 

ct 

CA 

IVC: 

—  cl 

,  ^  C4  ft,  c, 

0* 

In  Case  IVA  the  solution  and  proof  are  the  same  as  in  Case  IIIA. 

In  Case  IVB  we  choose  z,  =  0  and  z4  =  0  for  all  t.  As  always  we  let 
z,  —  x4.  We  choose  z,  in  any  way  such  that  z,  ( t )  <,  xt  ( t )  and  xt  (T)  =  0. 
Thus,  in  this  case  the  solution  is  not  unique.  Any  solution  of  this  form 
can  be  seen  to  be  optimal  by  pairing  it  with  the  w  solution  of  Fig.  2  for 
which  Mo  =  T. 

In  case  IVC  we  find  ourselves  in  an  intermediate  case,  unable  to  follow 
the  policies  suggested  by  IVA  and  B.  In  this  case  we  make  an  initial  ad¬ 
justment  of  the  steel  stockpile  down  to  the  value  c,',  using  this  steel  to 
increase  auto  capacity.  Thereafter  we  choose  zx  =  xt,  zt  =  0,  z,  =  xt, 
and  z4  =  0.  The  value  c,'  is  determined  so  that  xs  ( T )  —  0.  It  is  found  that 


(13) 


bx  cs~  bt  (c4  bx  ct) 
'  bt  +  bxT 


has  this  property.  This  solution  is  optimal  because  it  can  be  paired  with 
the  short-time  w  solution  of  Fig.  4. 
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Summary  Initial  Adjustments 


Cases 

I:  /,  >  0.  II :  /,  <  0  < 

III:  T  —  bt 

IV:  T  <bt 

A 

Adjust  x%  to  0  by 
increasing 

Bring  xt  to  0  by 
increasing  xt 
"Build  auto  capacity”. 

B 

“Build  steel  capacity”. 

No  initial 
adjustments. 

C 

No  Case  No  Case 

Adjust  so  that  xt  =  0, 
=  btxt.  by 
increasing  and 

*4 

Adjust  xt 
downward, 
but  not  to  0, 
so  that 
( T )  =  0. 
Increase  xt. 

After  the  initial  adjustments  the  optimal  policy  can  be  determined  by 
a  priority  system.  Before  <„  building  steel  capacity,  i.e.,  z«,  has  first 
priority.  This  continues  after  t ,  until  either  xt  ;>  6,  xt  or  t  =  bt,  whichever 
comes  first.  When  this  happens,  which  may  be  at  tlt  of  course,  first  priority 
is  given  to  auto  production,  z,.  This  will  use  up  all  available  steel  unless. 
xt  (*i)  >  t>i  x t  (*i)-  1°  that  case  second  priority  is  given  to  building  steel 
capacity  until  the  time  t0 ■  After  t0  second  priority  is  given  to  building 
auto  capacity. 
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A  Continuous  Stochastic  Decision  Process 

§  1.  Introduction 

As  we  have  seen  in  Chapter  II,  the  formulation  of  the  goldmining  prob¬ 
lem  in  its  discrete  form  leads  to  a  number  of  unsolved  problems  in  con¬ 
nection  with  the  three-choice  problem,  the  non-linear  utility  problem, 
and  many  others  we  could  formulate.  We  turn,  therefore,  to  a  continuous 
version  of  the  problem  in  the  hopes  of  overcoming  these  difficulties  by  use 
of  the  more  powerful  tools  of  continuity.  As  we  shall  see,  we  can  now 
resolve  the  corresponding  questions  in  complete  detail  and  thereby  obtain 
a  clear  insight  into  the  structure  of  optimal  policies.  The  information  we 
obtain  concerning  the  structure  of  policies  can  now  be  used  to  furnish 
useful  approximations  to  the  original  discrete  process. 

One  very  interesting  and  significant  fact  emerges.  Whereas  the  original 
discrete  problem  had  certain  linear  aspects  which  made  vaiiational  ana¬ 
lysis  difficult,  at  least  in  the  case  where  we  considered  expected  return, 
the  continuous  version  is  sufficiently  non-linear  to  permit  us  to  employ  a 
variational  approach  in  the  classical  manner,  with  certain  modifications 
required  by  the  presence  of  constraints.  However,  in  carrying  through 
this  approach,  our  knowledge  of  the  form  of  the  solution  for  the  discrete 
case  is  of  great  service  in  telling  us  in  advance  what  to  expect  to  find.  It 
is  a  combination  of  the  two  techniques,  old  and  new,  which  permit  a 
successful  attack  upon  the  problem. 

before  turning  to  the  method  we  shall  actually  employ,  we  shall  discuss 
two  alternative  approaches,  each  possessing  certain  features  of  difficulty 
which  render  them  inappropriate. 

It  is  perhaps  equally  as  important  to  know  which  methods  fail,  and 
why,  as  it  is  to  know  methods  which  wrork.  In  more  geneial  decision  pro¬ 
cesses  of  this  type,  a  correct  formulation  of  a  continuous  version  is  not 
trivial.  Particularly  is  tliis  true  in  the  case  of  multi-stage  games  of  con¬ 
tinuous  typo. 

There  are  many  different  possible  formulations,  and  the  correctness  of 
an  approach  must  be  judged  not  only  on  the  grounds  of  its  mathematical 
rigor,  but  also  on  the  grounds  of  analytic  difficulty.  If  we  do  not  have  a 
systematic  means  of  resolving  specific  problems,  we  do  not  have  a  satis¬ 
factory  theory. 
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After  this  preliminary  discussion,  we  shall  turn  to  the  approach  we 
shall  actually  employ,  which  is  a  compromise  between  the  two  prelimi¬ 
nary  methods. 

A  justification  of  our  approach  lies  in  the  fact  that  we  can  demonstrate 
that  the  limit  of  the  discrete  process,  in  a  suitable  sense,  is  the  continuous 
process  we  discuss.  We  shall,  however,  not  discuss  in  this  volume  these 
important  and  interesting  questions. 


§  2.  Continuous  versions — I :  A  differential  approach 

Let  us  now  proceed  to  discuss  some  possible  continuous  analogues  of 
the  functional  equation  of  (5.1)  of  Chapter  II. 

Our  basic  assumption  in  this  and  the  following  sections  will  be  that 
each  operation  is  to  have  a  high  probability  of  obtaining  a  small  amount 
of  gold  and  leaving  the  machine  undamaged.  In  other  words,  we  re¬ 
nounce  any  hope  of  solving  our  problem  for  all  values  of  the  parameters, 
and  consider,  instead,  a  small  region  of  the  parameter  space,  (r,,  rt,  qu  qt). 

We  introduce  the  quantities 

1  —  qx  d  =  the  probability  of  obtaining  r,  x  6  and  leaving  the  machine 
undamaged  if  Anaconda  is  mined, 

1  —  qt  d  =  the  probability  of  obtaining  r,  y  b  and  leaving  the  machine 
undamaged  if  Bonanza  is  mined. 


where  qx  and  qt  are  positive  and  d  is  a  small  enough  positive  quantity  so 
that  1  —  <7t  d  and  1  —  q,d  are  probabilities,  and  r,  d  and  rt  d  are  less 
than  one. 

With  f  [x,  y)  as  before,  we  have  the  functional  equation 


(1) 


/  (*.  y)  — 


(1  —  qtd)  (rt  x  d  f(x  —  r,r  6,  y)) 
(1  —  q2  d)  (rtyd  +  /(*,  y  —  rt  y  <5)). 


This  equation  is  precisely  (5.1)  of  Chapter  2  for  these  new  parameters. 
Proceeding  formally,  on  the  assumption  that  /  has  continuous  partial 
derivatives,  we  have,  for  small  d,  the  approximate  equation 


(2)/(x,y)  =  Max 


fix,  y)  +  b  (r,  x  —  qj{x,  y)—rlx  Pf/Px)  +  0  (d*)' 
fix,  y)  +  d{rty  —  qtf(x,y)—rtyPflcy)  -f-0(<5J). 


The  limiting  form  as  Pi  ->  0  is  the  equation 


(3) 


A  :  r !  x  —  qxf  —  rx  x  Pfl clvl 
.B:  rty  —  qtf  —  r2y  df/Py] 


This  approach  does  not  seem  to  he  a  fruitful  one  because  of  the  diffi¬ 
culty  of  establishing  existence  and  uniqueness  theorems  for  functional 
equations  of  this  type. 
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§  8.  Continuous  versions — II :  An  integral  approach 

Let  us  now  consider  a  diametrically  opposed  approach.  Let  Sn  denote 
some  sequence  of  A  (or  Anaconda)-choices  and  B  (or  Bonanza) -choices 
totalling  N  in  number.  Set 

psk  ( x ,  y)  —  the  probability  of  surviving  N  stages  and  ending  in  the 
state  represented  by  (xNk,  yNk),  using  Sn,  upon  starting 
in  state  ( x ,  y). 

Rn  ( x ,  y)  =  expected  return  from  N  stages  using  Sn,  starting  in  state 

(*.  y)- 

If  Sn  actually  consists  of  the  first  N  choices  of  an  optimal  policy,  we 
obtain  for  /  (x,  y)  the  functional  equation 

(1)  /(*.  y)  =  Rn  ( x ,  y)  +  E  psk  {x,  y) /(***.  y.vt) 

k 

If  N  6,  where  <3  is  as  above,  is  chosen  to  remain  finite  as  d  — *■  0  and 
N  — ►  oo,  and  set  equal  to  t,  the  analogue  of  (2.1'  is  a  functional  equation 
of  the  type 

(2)  /(x,  y)  =  Max  [fis  (x,  y,  t)  +  f  f  f{xr,  ys)  dGs  (r,  s,  x,  y,  *)] 

S  Jr-t  Ji-o 

where  S  denotes  a  continuous  policy  over  the  interval  [0,  /]  and  dGs  is  a 
transition  probability  determined  by  this  policy. 

Functional  equations  of  this  type  occur  in  the  general  theory  of  sto¬ 
chastic  processes.  We  shall  not  pursue  this  approach  in  this  volume  be¬ 
cause  of  the  many  difficulties  involved  in  justifying  this  equation  and  in 
defining  general  continuous  policies.  Instead,  we  shall  employ  an  approach 
intermediate  between  the  differential  and  the  integral  approach 
which  yields  a  functional  equation  bearing  the  same  relation  to  (2)  as  the 
diffusion  or  heat  equation  bears  to  the  Chapman-Kolmogoroff  equation 
in  the  theory  of  diffusion  processes. 

A  justification  of  this  approach  is  the  fact  that  it  can  be  demonstrated 
that  the  solution  of  the  discrete  process  approaches  the  solution  given  by 
the  continuous  process  as  <3  -*■  0.  However,  as  stated  above,  we  shall  not 
discuss  this  question  here. 

§  4.  Preliminary  discussion 

Lev  us  continue  to  use  the  simple  equation  of  (2.1)  as  our  model  for  the 
following  discussion.  According  to  the  solution  discussed  earlier  in  Chapter 
II,  the  A-  and  B-regions  are  separated  by  the  boundary  curve 

r,  x  r-y 

(1)  Lt:{\  —  qxb)  ‘--(1  -qt6) 

H\  Ht 
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which,  as  6  ■>  0,  approaches  the  line 


(2) 


.  .  *_i  x  _  rt  y 
9i  ~  9* 


For  each  6  >  0,  the  optimal  policy  has  the  following  form : 

"If  below  Lt  ontinue  using  the  /4-policy  until  in  the  B- region,  above 
La.  Then  use  the  fi-policy  until  in  the  A  -region,  below  Lt,  and  so  on; 
similarly  if  above  La  to  start.” 

Geometrically : 


The  limiting  form  of  this  policy  as  6  -*•  0  is  the  following: 

"If  (x,  y )  is  below  L,  use  A  until  the  line  L  is  reached,  then  continue 
along  L  thereafter;  if  ( x ,  y)  is  above  L,  use  B  until  the  line  L  is  reached, 
then  continue  along  L  thereafter.” 


Let  us  observe  that  a  policy  of  this  type,  which  requires  motion  along 
L,  is  not  included  in  the  set  of  policies  associated  with  any  nonzero  6. 
These  policies,  allowing  only  the  use  of  A  or  B,  yield  broken-line  paths 
consisting  of  horizontal  and  vertical  pieces,  as  in  Fig.  1. 

It  is  clear,  however,  that  a  path  such  as  that  given  in  Fig.  2  may  be 
arbitrarily  closely  approximated  by  an  optimal  policy  as  6  0. 

This  suggest  the  important  point  that  a  continuous  version  of  the  ori- 
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ginal  discrete  problem  may  not  possess  an  optimal  policy  yielding  a 
maximum  return.  Instead  there  may  only  exist  a  sequence  of  policies 
yielding  a  supremum-unless  we  suitably  broaden  the  concept  of  a  policy. 
The  natural  way  to  accomplish  this  extension  is  to  allow  for  the  mixing  of 
decisions,  in  some  suitable  sense,  at  each  time. 

§  5.  Mixing  at  a  point 

The  introduction  of  mixing  at  a  point  is,  however,  with  no  intention  to 
pun,  a  mixed  blessing,  since  it  carries  along  with  it  a  number  of  difficulties 
of  both  physical  and  mathematical  nature.  Mathematically,  we  find 
ourselves  confronted  by  the  same  difficulties  that  made  us  wish  to  bypass 
the  integral  formulation  of  §  3;  physically,  we  are  reluctant  to  accept  a 
policy  which  involves  mixing  decisions  as  one  applicable  to  a  problem 
where  a  choice  of  one  or  the  other  decision  is  required. 

To  avoid  simultaneously  the  conceptual  difficulties  of  both  mathema¬ 
tical  and  physical  origin,  let  us  employ  an  interpretive  device  which  has 
been  used  before  in  a  very  similar  situation.  The  essence  of  this  device  is 
the  observation  that,  under  certain  natural  continuity  assumptions, 
mixing  decisions  at  a  point  is  equivalent  to  mixing  decisions  over  small 
intervals  about  the  point. 

We  shall  assume  then,  to  construct  our  mathematical  model,  that  we 
are  considering  a  process  which  requires  at  the  times  t  —  0,  A,  2 A,  etc., 
that  we  determine  the  proportion  of  the  following  time  interval  of  length 
A  which  will  be  devoted  to  A  and  B  respectively.  Thus,  over  a  typical 
interval  \kA,  kA  -f  A],  we  devote  the  first  part,  [kA,  kA  -(-  tp^A)  to  the 
use  of  A  ;  and  over  the  second  part  [kA  -f  qp,J,  kA  +  A],  B  is  used: 

A  B 

_  i  V  '  i _ 

I  I  I 

kA  kA  -4-  9?i  A  [k  -j-  1)  A 

Figure  3 

The  choice  of  <pt  will  depend  upon  k,  or  more  specifically  upon  x{kA), 
and  y  {kA),  and  k  itself,  if  the  process  is  finite. 

Assuming  that  A  is  small,  so  that  the  process  is  sufficiently  well  de¬ 
scribed  by  first-order  effects,  we  shall  in  the  limit  as  A  — >  0  obtain  a  set  of 
differential  equations  which  we  will  use  to  define  our  continuous  process.1 
A  continuous  policy  will  now  be  equivalent  to  a  function  (t). 

In  the  next  chapter,  we  shall  derive  the  differential  equations.  To 
illustrate  the  power  of  the  method  we  shall,  in  turn,  solve  problems  cor¬ 
responding  to  the  two-choice  problem,  to  the  two-choice  problem  for  a 

1  Itecall  tlic  corresponding  comment  in  Chapter  VII. 
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finite  number  of  stages,  to  the  two-choice  problem  with  a  nonlinear 
utility  function,  corresponding  to  the  problem  discussed  in  Exercise  1 
of  Chapter  II,  and  to  the  three-choice  problem  of  §  13  of  that  chapter. 
Although  the  analysis  is  quite  detailed,  the  guiding  ideas  are  simple. 

To  justify  the  use  of  this  formalism,  it  should  be  shown  that  the  con¬ 
tinuous  process  obtained  in  this  way  is  actually  the  limit  of  the  original 
discrete  process  in  a  natural  sense.  This  will  be  discussed  in  the  second 
volume. 

§  6.  Reformulation  of  the  gold-mining  process 

Let  us  now  proceed  to  carry  through  the  program  outlined  in  the  pre¬ 
ceding  sections.  An  interesting  feature  of  the  mathematics  will  be  the 
continued  interplay  between  the  techniques  of  the  classical  calculus  of 
variations  and  those  of  dynamic  programming. 

Let  us.  to  clarify  the  issue,  rephrase  the  problem  we  are  considering: 

"At  each  of  the  time  instants  t  —  kA  we  shall  have  to  make  a  decision 
concerning  the  proportion  of  the  following  interval  of  length  A  which  will 
devoted  to  the  use  of  the  machine  in  mine  A  and  to  the  use  of  the  machine 
in  mine  B.  This  involves  the  choice  of  a  fraction  93,,  which  depends  upon 
the  amounts  of  gold  in  the  two  mines  at  time  t,  and  upon  t  itself,  if  the 
process  is  finite. 

We  arbitrarily  assume  that  once  this  proportion  93,  has  been  chosen,  the 
first  part  of  the  interval  [kA,  (k  -f  93,)  A],  is  devoted  to  use  of  the  machine 
in  A,  and  the  second  part,  [(A  -f-  93,)  A,  ( k  -f-  1)  A],  to  use  of  the  machine 
in  B.  If  x  is  the  amount  of  gold  in  mine  A  at  time  kA,  there  is  a  probability 
\  —  q^fp  XA  that  an  amount  rxxq>xA  is  mined,  and  that  the  machine  is 
undamaged;  and  a  probability  qx  q>x  A  that  no  gold  is  mined  and  that  the 
machine  is  irretrievably  damaged.  If  mine  B  contains  y  at  time  Ad  there 
is  a  probability  1  — qt  93,  A  that  the  amount  rtyqtA  is  obtained,  and 
that  the  machine  is  undamaged ;  and  a  probability  qt  9?,  A  that  the  opera¬ 
tion  ceases,  where  93,  =  1  — 93,. 

The  problem  is  to  determine  the  sequence  of  operations  which  maxi¬ 
mizes  the  expected  amount  of  gold  mined  before  the  machine  is  damaged.’’ 

§  7.  Derivation  of  the  differential  equations 

It  is  easily  seen  that  if  A  is  small,  permuting  the  order  of  operations 
in  [AJ,  (A  -f-  1)  A]  is  a  second-order  effect.  It  is  this  feature  which  allows 
mixing  over  intervals  to  perform  the  function  of  mixing  at  a  point. 

A  policy  now  consists  of  a  sequence  px  (Ad)},  A  =  0,  I,  2,  ...  .  For 
any  given  policy,  let 
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x  (()  =  amount  of  gold  remaining  in  A  provided  the  operation  has 
continued  to  /, 

y  (/)  =  amount  of  gold  remaining  in  B  provided  the  operation  has 
continued  to  t, 

P  (0  —  probability  that  the  machine  survives  until  t,  i.e.,  that  the 
operation  continues  until  t, 
f  ( t )  =  expected  amount  of  gold  mined  up  to  time  t, 

where  i  =  nA,  n  =  0,  1,  2 . 

Ignoring  the  second-order  terms  in  A,  we  have 

(1)  x(t  -f  A)  =  x{t)  —  rl<px  (t)  x(l)A 
y  (t  +  A)  =  y  (t)  —  r,  <pt  (*)  y  ( t )  A 

P(t  +  A)  =  p  { t )  (1  —  qx  <px  (/)  A  —  qt  <pt  ( t )  A) 
fit  +  A)  =  fit)  +  P  it)  [9?,  it)  r,  x  (/)  +  q>t  (f)  rt  y  (/)]  A 
Letting  A  ->  0,  we  obtain  the  system  of  differential  equations 

(2)  dx/dt  =  —  95,  it)  rt  x  it),  x  (0)  =  x0, 

dyldt  =  —<£),(/)  r,  y  it),  y  (0)  =  y0, 

dp/dt  =  —pit)  [9 p,  it)  qx  +  <pt  it)  qt],  p  (0)  =  1 

df/dt  =  p  it)  [90,  it)  r,  X  it)  +  q>t  (0  rt  y  (/)],  /( 0)  =  0 

We  now  take  these  equations  as  the  defining  equations  of  our  process,  and 
ignore  their  formal  origin.  The  problem  v  e  set  ourselves  is  that  of  deter¬ 
mining  95,  =  <px  it),  where 

(3)  0<,q>xit)<,  l,<p2  it)  =  1  —  <pi  (t), 

so  as  to  maximize / (T).  A  case  of  particular  importance  is  T  —  00. 

We  shall  derive  similar  equations  for  the  three-choice  problem  in  §  12 
below. 

§  8.  The  variational  procedure 
Let  <p,  and  90,  be  functions  furnishing  the  maximum,*  and  let 

(1)  951  =  951  +  e/?i  (f), 

where  e  is  a  small  positive  quantity,  and  /?,,  /?,  are  two  functions  of  t 
satisfying  for  all  t  ^  0  the  conditions 

(2)  0  <£  9><  +  e  fi,  <S  1,  0X  +  0t  =  0 

(which  implies  |  /3i  |  1/e),  so  that  the  951  are  also  admissible  q>' s. 

*  It  is  easy  to  show,  as  a  consequence  of  the  uniform  boundedness  of  the  function 
<px  (/),  that  the  maximum  is  attained. 
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It  follows  that  fii  (/)  ^  0  if  <pt  (/)  =  1,  /?<  (/)  ^  0  if  (t)  =  0,  and  /9<  can 
be  of  either  sign  if  0  <  qpt  (/)  <  1,  the  region  where  free  variation  is  per¬ 
mitted.  Performing  the  variation,  we  find  readily  that 

(3)  x(/)  =*  (/)(1—  er.B^/))  -f  o  (e)* 
y  W  =  y  (0  (1  —  «  r,  Bt  (/))  +  o  (e) 

P(Q  =  P  (0  (1  —  £  9i  Bi  (0  —eq,Ba  (/))  +  o  ( e ) 

/ (7)  -f(T)  =  e  J'  {— /'  (()  (qt  B ,  (<)  +  ?i B,  (0)  +  r.B,  (/)  p (<)  *'  (<) 

+  r,  B,  (/)  *>  (/)  y'  (0  +  r,  /9l  (!)  p  (t)  x  (t)  +  r,/3,  (f)  p  (t)  y  (())}  it 

+  o  (e) 

where  we  have  set 

(4)  Bt  (t)  =  JJ  0,  (s)  rfs 

and  the  bars  refer  to  the  perturbed  variables. 

Integrating  by  parts  to  eliminate  the  Bt  ( t ),  we  find 

(5)  f(T)  —f(T)  -  £  J'  [Kv  (t)  /?.  (t)  +  Kt  (t)  pt  (t)]  it  +  o  (e) 
where 

(6)  A',  (t)  =  —  qv  J(  /' (s)  is  +  r,  p  (7)  x  (7)  —  rx  />' (s)  *  (s)  rfs 

A',  (/)  =  -  ?,  J7,  /' (s)  rfs  +  r.  p  (7)  y  (7)  -  r,  J*  f  (s)  y  (s)  rfs 

Since /(7)  —  / (7)  0,  we  see  that  whenever  Kt  ( t )  >  Kj  ( t )  we  must 

have  <ft(t)  ■=  I ,  <pj  (t)  —  0.  These  relations  yield  implicit  equations  for 
(pi  and  <pj.  In  the  next  section  we  shall  discuss  the  behavior  of  the  K- 
functions  in  more  detail,  in  order  to  determine  9?,  (I)  explicitly. 

§  9.  The  behavior  of  A,-. 

The  fundamental  relation  is 

(1)  djdt  (Ka  —  Kt)  =  (qt  —  qt)  f  (t)  —  p'  ( t )  (rt  y  —  r,  x) 

—  P  [?i  ft  y  —  ri  *]• 

*  The  term  o  (e)  denotes  a  function  of  /  which  approaches  0  as  e  0  for  all  t 
in  [0,  T). 
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Thus  a  "mixed  policy”  (one  for  which  more  than  one  of  the  <pt  is  positive 
for  a  given  /,  which  implies  AT,  ( t )  =  Kt  (/))  can  be  optimal  only  on  the 
line  qxrty  =  qt  r,  x.  This  line  is  precisely  the  boundary  line  that  one 
obtains  by  passage  to  the  limit  from  the  solution  in  the  discrete  case  as 
A  — ►  0,  as  in  §  4.4 

If  a  mixed  policy  is  pursued  along  the  line,  q>x  and  <pt  must  be  chosen  to 
stay  on  this  line,  which  means  that  the  slope,  s  =  y/x,  must  be  kept 
constant.  Since 

(2)  djdt  (y/x)  =  y'fx  —  (x'jx)  s  (t)  =  [r,  q>x  —  rt  <px]  s 

we  see  that  we  must  have 


(3) 


rt  r, 

“  Tl  +  rt  ,<Pt  ~  rt  +  rt 


§  10.  The  solution  for  T  =  oo 

With  these  preliminaries  out  of  the  way,  let  us  determine  the  optimal 
policy  for  the  infinite  process,  T  =  oo.  The  infinite  problem  is,  as  usual, 
simpler  than  the  finite  case  because  of  the  homogeneity  introduced  by 
infinite  time;  after  any  initial  actions,  we  are  confronted  by  a  problem  of 
the  same  type,  with  different  initial  values.  Let  us  note  that  a  conse¬ 
quence  of  this,  and  the  homogeneity  of  the  equations  with  respect  to  x 
and  y,  is  that  the  decision  at  any  point  is  a  function  only  of  the  slope 
s  =  y/x. 

Let  us  begin  by  observing  that  if  policy  A  is  ever  used  above  the  line 
qx  rt  y  =  qt  ri  *  in  the  (x,  y)-planc,  it  is  used  thereafter.  This  follows  im¬ 
mediately  from  (9.1)  which  shows  that  A',  —  A’,  is  increasing  when  qx  rt  y 
—  qt  rx  x  >  0.  Since  use  of  A  decreases  x  and  leaves  y  unchanged,  once 
A',  >  Kt  the  use  of  A  maintains  the  inequality. 

Near  the  y-axis,  however,  the  use  of  A  continually  is  not  as  rewarding 
as  continual  use  of  B.  For  with  <px  =  1,  qpt  =  0,  for  t  ;>  0,  we  have 

(1)  x  (/)  x0  e-r‘l 

y  (0  =  y° 

P  (t)  =  e-<<‘ 

f  (t)  =  J  rx  x0  e~  r>  *  e~ *■  *  ds 

and  thus 

fA  (oo)  =  r,  x„/(y.  +  r ,)  . 

4  Ha /ing  been  led  to  expect  the  appearance  of  this  line  as  a  consequence  of 
the  analysis  of  the  discrete  case,  it  is  relatively  easy  to  spot  it. 
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However,  <pt  —  0,  q>t  =  I  for  all  t  yields  similarly /«  (oo)  =  r,  yo/(qt  +  rt). 
For  y,/x»  sufficiently  large  /«  (oo)  >  / a  (oo).  Thus,  there  is  a  region  near 
the  y-axis  where  B  is  used. 

This  region  where  B  is  used  extends  down  to  the  line  9,  r,  y  =  qt  r,  x. 
To  prove  this  we  observe  that  a  mixed  policy  cannot  be  pursued  above 
the  line,  and  that  if  A  is  ever  used  above  the  line  it  is  always  used  there¬ 
after.  Using  A  indefinitely,  however,  would  eventually  take  (x,  y)  into 
the  region  near  the  y-axis  where  B  is  known  to  be  optimal,  a  contradic¬ 
tion.  Hence  B  is  always  used  above  the  line.  Similarly,  below  the  line  A 
is  always  used. 

When  the  line  rt  y  —  q%  rx  x  is  reached,  the  point  (x,  y)  must  remain 
on  the  line  thereafter.  For  if  not,  then  an  A  policy  must  be  used  in  a  B 
region  or  vice  versa,  which  is  impossible.  Hence,  on  the  line  itself  the 
mixed  policy  of  (9.3)  must  be  employed. 

We  have  thus  demonstrated 


Theorem  1.  With  reference  to  the  equations  (7.2)  and  the  constraints  (7.3), 
the  maximum  value  of  f  (oo)  is  attained  by  use  of  the  policy 


(3) 


<P  i 
<Pt 

<Fi 


1  for  qxrty  <  qt  r\  x , 
1  for  qxrty  >  qtrxx , 
U  rx 

+  rx  +  r, 


for  qx  r,  y  =  qt  rx  x . 


Note  that  q>x  and  9?,  are  determined  almost  everywhere  by  the  above 
arguments,  and  hence  are  essentially  unique.  The  above  constructive 
derivation  of  the  solution  furnishes  an  alternative  existence  proof. 


§  11.  Solution  for  finite  total  time 

In  finding  the  solution  for  finite  T,  we  shall  begin  by  determining  what 
policy  is  used  last.  Since  an  optimal  policy  has  the  property  that  its 
continuation  after  any  initial  part  is  also  optimal,  we  shall  consider  first 
the  case  where  T  is  small.  We  have 


(1)  /('/')  p  (x)  [?>,  (x)  r,  x  (*)  +  (x)  r,  y  (x)]  ds 

rr  rr 

r x  .v„  9',  (x)  ds  +  rty.  yt  (x)  ds  +  0  (T) 

Jo  Jo 

for  T  close  to  0. 

It  follows  then  that  for  small  T  the  maximum  is  obtained  by  taking 
y  ,  (x)  **  1.9-,  (x)  0  for  r,  xa  >  r,  y„  and  9>,  (x)  =  0,  9 r>,  (x)  =  1  for  r,  y0 

>  r ,  x«.  As  is  to  be  expected,  for  processes  of  small  duration  expected 
gain,  without  worry  about  termination,  is  the  determining  factor. 
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If  qt  =  qt  the  lines  r,  y  =  r,  x  and  qxrxy  =  qt  r,  x  coincide,  and  the 
optimal  policy  is  easily  found  to  be  the  same  as  that  for  T  =  oo. 

Let  us  consider  the  general  case  where  qx  qt.  Assume,  without  loss 
of  generality,  that  the  line  rt  y  =  rxx  lies  above  the  line  qxrty  =  qt  r,  x. 
The  positive  quadrant  is  the  divided  into  three  regions,  which  we  label 
I,  II,  III.  (Fig.  4). 


X 


q. 


As  before,  it  follows  that  in  region  I  a  Zl-policy  once  used  must  be  con¬ 
tinued  thereafter,  while  in  regions  II  and  III  the  same  holds  for  an 
A -policy.  Also,  in  region*  I  and  II  an  A-policy  is  used  if  the  time  remaining 
is  sufficiently  small,  and  in  III  a  ^-policy  under  the  same  conditions. 
From  this  we  conclude  that  an  A  -policy  is  always  used  in  I,  and  a  B-policy 
always  while  in  III. 

Let  us  now  establish  that  an  optimal  policy  never  switches  from  A  to 
B.  Let  us  suppose  otherwise  and  let  t0  be  the  time  at  which  the  change 
occurs.  Since  at  t0,  A  is  terminated,  the  point  {x  (/„),  y  ( t0 ))  must  be  in 
region  I,  or  on  the  boundary  between  I  and  II.  Using  B  will  keep  the 
point  (x  (t),  y  (/))  in  I  lor  all  t  >  t0  since  we  know  that  B  once  used  in  I 
must  be  continued.  However,  this  contradicts  the  fact  that  A  is  used  in  I 
whenever  the  time  remaining  is  sufficiently  small.  Simila  ly,  the  combina-. 
tion  of  using  the  mixed  policy  and  then  B  cannot  oc  ar,  sii.ee  the  change¬ 
over  must  occur  on  the  boundary  between  I  and  IT,  and  then  B  is  used 
thereafter  in  region  I,  a  contradiction. 

This  reduces  the  number  of  types  of  solutions  to  six:  A  always;  B 
always;  the  mixed  policy  followed  by  A  ;  A  then  the  mixed  policy  and 
finally  B ;  B  then  the  mixed  policy  and  then  A  ;  B  followed  by  A. 

Let  to  be  the  value  of !  at  which  the  last  change  of  policy  is  made  in  an 
optimal  strategy,  if  such  a  change  occurs.  For  t0  <  /  <,  T,  we  must  have 
q)l  (t)  =  1,  q> 2  (/)  =  0.  We  now  compute  the  value  of  Kx  (t0)  —  Kt  (t0). 
We  have  for  t0  <  t  T, 
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(2)  X  (t)  =  X  (to)  e-'>  y  (/)  =  y  (t0) 

p(t)  = 

/'  M  =  P  (to)  e~  <«.  +  '•>  <‘-».)  r,  *  (to) 
and,  after  some  simplification, 

(3)  Kx  (to)  —  Kt  (to)  =  p  (to)  rx  x  (to)  [(l  -  —  <»■ + 

_  '■i  y  (to)  1 
?i  +  rx  rxx  (to) J 

For  any  fixed  point  (x  ( t0),y(to ))  in  II,  the  right  side  is  positive  for 
T  —  to  small,  and  negative  for  T  — 10  large.  It  is  equal  to  zero  for  pre¬ 
cisely  one  value  of  T  — 10.  This  zero  determines  when  the  changeover 
occurs.  When  it  occurs,  A  is  used  for  the  remaining  time,  with  any  of  the 
six  beginnings  above,  depending  upon  the  location  of  the  initial  point. 

§  12.  The  three-choice  problem 

The  continuous  version  of  the  three-choice  problem  mentioned  above 
in  §  13  of  Chapter  II  leads  via  the  same  formal  process  as  given  in  §  7  to  the 
following.  Given 

(1)  dxjdt  =  —  [9?!  (/)  r,  +  q>3  (t)  r3]  x  (t),  x  (0)  =  *. 

dy/dt  =  —  [<p,  (t)  r,  +  (0  r4]  y(t),  y  (0)  =  y0 

dpjdt  =  —P(t)  [9?,  (t)  qx  +  qpt  (t)  qt  +  <p3  ( t )  q3 ].  p  (0)  =  1  , 

df/dt  =  p  (t)  [(99,  (t)  rx  +  9>s  (/)  r3  )x  (i)  +  (q>i  (0  rt  +  T*  (0  rt)  y  (<)] 

/  (°)  —  0 , 

where,  for  all  t, 

(2)  qpx  -f  95,  +  9?3  =  1,  (ft  ;>  0, 

It  is  required  to  determine  the  q>t  (<)  so  as  to  maximize  f(T). 

We  shall  consider  only  the  case  where  T  =  00. 

As  before,  let  us  set  =  qa  -\~  and  Bi  (t)  —  J  /?<  (s)  ds 

We  obtain 

(3)  x(t)  =  x  (t)  (1  —  e  rx  Bx  (t)  —  e  r3B3  (l))  -f  o  (e) 
y(t)  =y  (0  (1  —  e  rt  Bx  ( t )  —  e  r3B3  (t))  +  o  (e) 

p  (t)  =  p(t)  (1  -  f  1’  qt  Bt  (/))  +  o  (e) 

i  -  I 

dfjdt  =  p  [(<px  r,  +  y3  r3)  x  +  (q>3  rx  +  <p3  r4)  y] 
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Consequently,  following  the  same  technique  as  before,  we  obtain 

(4)  f(T)  — f(T)  =  £  J'  [A',  px  +  Ktfi,  +  A,  0t]  dt  +  o  (£) 
where 

(5)  A,  (/)  =  -  qx  J  V  (s)  rfs  +  r,  p  (T)  x  (T) —  r,  J'  p'  (s)  *  (s)  ds 

W  =  —  ?.  JT/'  (s)  -f  r,  (7)  y  (T)  —  rt  f*  p'  (s)  y  (s)  ds 

W  =  -  q*  J  Y  (*)  *  +  P  (T)  [r,  X  (T)  +  rty  (T)] 

—  J(T  P'  (s)  [r,  a:  (s)  +  rt  y  (*)]  ds 

j  §  13.  Some  lemmas  and  preliminary  results 

The  statements  in  the  lemmas  below  concerning  the  dependence  of  the  <pt 
upon  the  A(  are,  of  course,  taken  to  hold  almost  everywhere. 

Lemma  1.  If  Kt  (l)  >  Kj  ( t),then  qn  (t)  =  1  or  <pj  ( t )  —  0. 

Proof:  Let  £  be  the  set  of  t  for  which  the  assertion  does  not  hold.  Let 
pi  —  1,  P)  =  —  1  for  t  in  E,  and  let  the  P's  be  zero  otherwise.  The  varia¬ 
tion  is  admissible  for  e  sufficiently  small  and  makes /  (T)  —  /(£)  positive 
if  m  (£)  >  0. 

Lemma  2.  If  Aj  (t)  >  Kj  (t)  for  j  i,  then  cpi  —  1. 

The  proof  follows  immediately  from  the  above. 

Lemma  3.  If  there  is  a  j  such  that  Ki  ( t )  <  Kj  (/),  then  <pi  —  0. 

Again  a  simple  consequence  of  Lemma  1. 

Let  us  now  compute  the  derivatives  of  the  A<.  A  straight-forward  cal¬ 
culation  yields  the  symmetric  results 

(1)  A,'  (l)  -  p  [Ct  <p,  +  C,  90,] 

K*  (0  =  p[—C1<p1  — C,  953] 

AY  (0  =  P  [ —  C*<Fi  +  C3(pt] 

where  we  have  set 

(2)  Ci  =  qlriy  —  qtr,x 

C3  =  qlrty  —  (q3r1  —  q1r3)  x 
C3  =  (q  3  rt~qt  ri)y—q3  r3  x 
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The  relative  positions  of  the  three  lines  Ct  —  0  are  determined  by  the 
quantity 

(3)  D  =  qlriri  +  qt  r,  r4  —  q3  r,  rt 

If  we  assume  that  all  three  lines  lie  in  the  positive  quadrant,  a  straight¬ 
forward  calculation  shows  that  if  D  >  0  the  lines  have  the  position  shown 
in  Fig.  5,  while  if  D  <  0  they  lie  as  shown  in  Fig.  6. 


Figure  5  Figure  6 


It  is  possible  for  both  cases  D  >  0,  D  <  0  to  occur.  The  case  where  one 
of  the  lines  Ct  —  0,  C,  =  0  lies  outside  the  positive  quadrant  yields  an 
immediate  simplification  of  the  following  arguments  without  changing 
the  over-all  structure.  Consequently,  we  shall  discuss  in  detail  only  the 
above  cases. 

§  14.  Mixed  policies 

As  above,  we  denote  by  the  term  "mixed  policy"  a  situation  in  which 
some  of  the  99,  have  values  different  from  0  and  1.  By  an  .4 -policy  we 
shall  mean  9^  =  1 ,  a  B-policy  9",  =  1 ,  and  a  C-policy  95,  =  1 .  Let  us  prove 

Lemma  4.  No  optimal  policy  contains  a  mixture  of  A,  B,  and  C  policies. 

Proof:  Let  us  assume  that  in  some  interval  we  have  simultaneously 
q>i,  <pt,  9 ?,  >  0.  In  this  interval  we  must  have  A',  =  A,  =  A,. 

This  yields 

(1)  Ti  +  <fi  +  (p3  =  1 

Ax'  —  A,'  =  p  C,  9?,  -f  C,  95,  4-  (C,  +  Cj)  993]  =  0 
A,'  —  A,’  =  p  [C,99,  -f  (C,  —  C3)  99,  -f  Ct<p3]  -=  0 
The  solution  for  99,,  is,  if  Cl  —  Ct  —  C3  ^  0, 
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Since  the  <pi  must  be  positive  in  this  interval,  we  must  have  C ,,  —  Ct, 
and  — C,  all  of  the  same  sign.  It  is  easily  verified  upon  referring  to  Figs. 
2  and  3  that  in  both  cases  D  >  0,  D  <  0,  this  can  never  occur. 

Furthermore,  C,  —  Ct  —  C,  =  0  only  if  the  lines-  Cx  —  0,  C,  =  0,  C, 
=  0  coincide.  When  this  occurs  the  problem^  equivalent  to  the  two- 
choice  problem. 

Let  us  now  investigate  the  possibility  of  using  mixed  policies  involving 
only  two  of  the  three  policies,  A,  B,  or  C. 

Lemma  5.  Concerning  the  mixing  of  two  and  only  two  policies,  we  have  the 
following  results: 

(3)  (a)  A  mixture  of  A  and  B  is  permissible  only  along 

=  0,  where  <px  =  rt/(r,  +  rt),  <p3  =  r,/(r,  +  rt). 

(b)  A  mixture  of  A  and  C  is  permissible  only  along  C,  =  0,  where 

rt  —  r3  rx 

<P  i  = - , - .  <P»  =  - ; - 

ri  +  rt  —  r3  r,  +  rt  —  r, 

(c)  A  mixture  of  B  and  C  is  permissible  only  along  C,  =  0,  where 

„  _  r»~U  „  _  rt 

<p1 - —  ,  <ps - — - — - 

rt  +  r,  —  r4  rt  +  rt  —  r4 

Proof:  If  q>u  <p3  >0 ,<pa  —  0,  we  must  have  A',  =  Kt  >  A,.  In  an 
interval  where  this  occurs, 

(4)  0  =  Kx  —  Kf  =  p  [Ci  ((pi  +  9?t)] 

Hence  Ci  =  0.  The  values  of  9 p,  and  cpt  which  keep  ( x ,  y)  on  this  line  are 
determined  as  in  the  two-choice  case.  The  other  assertions  in  Lemma  5 
are  obtained  similarly. 

!t 

§  15.  The  solution  for  infinite  time,  D  >  0 

Having  obtained  these  auxiliary  results,  we  now  proceed  to  find  the 
solution  to  the  problem  of  maximizing/ (00).  We  shall  assume  that  r3  > 
r4,  since  the  case  r,  >  r3  can  be  handled  by  interchanging  the  roles  of  x 
and  y  and  A  and  B.  The  degenerate  case,  r3  =  r4,  will  be  discussed 
separately. 

Let  us  make  an  initial  observation  that  when  r3  >  r4  the  mixed 
/1C  policy  is  never  used,  for  by  (14.3)  and  <p3  cannot  both  be  positive. 
The  solution  takes  two  distinct  forms  depending  upon  whether  D  >  0  or 
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D  <  0.  Let  us  begin  by  considering  D  >  0.  We  shall  establish  the  prin 
cipal  results  in  a  series  of  lemmas. 


I.emma  6.  In  an  optimal  policy,  B  is  used  near  the  y-axis. 


Proof:  There  is  a  region  near  the  y-axis  where  A  is  not  used.  For  if 
Ct  >  0,  Ct  >  0  and  A  is  used,  i.e.,  9),  (f)  =  1,  we  have  Kf  =  0,  Kf 
<  0 ,K3'  <  0.  This  means  that  Kx  remains  the  largest  for  f,  2>  t.  Hence, 
if  A  is  used  in  this  region,  it  must  be  pursued  thereafter.  Let  us  now 
compute  the  results  of  a  continued  ^-policy,  a  continued  B-policy,  and  a 
continued  C-policy.  We  have 

(1)  f  a  (00)  =  r,  x„l(ql  +  r,) 

Jb  (00)  =  rt  y„/(<7,  +  rt) 


fc  (00)  = 


q%  +  rs  q3  +  rK 


A  comparison  of  f  a  (00)  and  Jb  (00)  shows  that  /»  (00)  >  fA  (00)  for 
yo/xo  sufficiently  large. 

Let  us  now  show  that  in  the  region  above  the  line  Cs  =  0,  if  C  is  used 
it  is  used  continually  thereafter.  Using  C  increases  the  slope  s  ( t )  = 
y  ( t)/x  (t),  for  with  9,  =  1  we  have 

(2)  s'  ( t )  =  s  (l)  [r3  —  rt)  >0 

On  the  other  hand,  using  B  decreases  the  slope.  Hence,  we  cannot  use  B 
after  C,  for  to  do  so  would  return  us  to  a  region  where  C  was  to  be  used. 
We  have  already  shown  that  A  cannot  be  used  after  C  when  close  to  the 
y-axis.  A  comparison  of  /«  (00)  and  fc  (00)  shows  that  it  is  better  to  use 
B  rather  than  C  near  the  y-axis  if  rtyl(q3- f  r,)  >  r3yj(q3  +  ^i),  or 
ri  —  q  1  r4  >  0.  This,  however,  is  precisely  equivalent  to  the  condition 
that  C3  =  0  lie  within  the  positive  quadrant,  which  we  have  assumed. 

It  follows  that  there  is  a  region  near  the  y-axis  where  neither  A  nor  C 
is  used.  Since  by  Lemma  5  no  mixed  policy  is  used  above  the  line  C3  =  0, 
we  conclude  that  there  is  a  region  adjoining  the  y-axis  where  B  must  be 
used. 


Lemma  7.  The  lower  boundary  of  the  B-region  adjoining  the  y-axis  is  the 
line  Cj  =  .0.  On  that  line  a  mixed  BC-policy  is  employed.  Below  C3  =  0, 
B  is  never  used. 

Proof:  Let  us  begin  with  initial  values  (xn,  y„)  near  the  y-axis  in  the 
region  where  B  is  used  and  consider  what  form  an  optimal  strategy  can 
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have.  B  cannot  be  used  indefinitely  since  this  would  eventually  take 
( x ,  y)  near  the  x-axis  where  comparison  of  Ja  (oo)  and  /a  (oo)  shows  that 
A  is  superior.  However,  since  both  A  and  C  increase  the  slope  y/x,  B 
cannot  be  followed  by  A  or  C  since  both  of  these  would  immediately  put 
the  point  (x,  y)  back  into  a  region  where  B  is  to  be  used.  Consequently,  B 
must  be  followed  by  one  of  the  mixed  policies. 

As  we  have  already  seen,  for  rt  >  rt  the  mixed  policy  AC  is  never  used 
in  an  optimal  strategy.  We  assert  that  if  a  mixed  policy  is  used  in  an  opti¬ 
mal  strategy,  then  continuing  the  mixed  policy  forever  is  optimal.  For 
let  (t  Ot  t%)  be  an  interval  on  w^hich  the  mixed  policy  is  pursued.  Since  the 
point  (x  (ti),  y  (/,))  lies  on  the  same  ray  as  (x  ( t0 ),  y  {to)),  because  of  the 
homogeneity  the  same  policy,  continued  for  an  equal  length  of  time, 
is  optimal.  Hence  the  mixed  policy  may  be  continued  forever.  Taking 
this  remark  into  account,  we  can  show  that  for  D  >  0  the  mixture  AB 
never  occurs  in  an  optimal  strategy.  By  Lemma  5a,  AB  could  only  be 
used  on  the  line  C,  —  0.  If  AB  were  used  there,  we  would  have 

•A*  =  P  [£j  9s*  —  Ct  9h]  <  0 

since  C,  >  0  and  C,  <  0  there  (cf.  Fig.  2).  Since  A,  (oo)  =  Kt  (oo)  = 
K3  (oo)  =  0  and  A,  =  A,  =  0  while  AB  is  being  used,  it  follows  that 
K3  >  A,  =  Kt  while  the  /IB-mixturc  is  being  used.  This,  however, 
implies  that  <p3  =  1,  9;,  =  cp2  —  0,  which  is  a  contradiction. 

The  remaining  possibility  then  is  that  BC  is  used  after  B  on  the  line 
C3  —  0.  B  cannot  be  used  below  this  line  as  a  consequence  of  the  above 
arguments. 

Lemma  8.  There  is  a  line  L  =  0  between  C,  =  0  and  the  x-axis  such  that 
C  is  used  in  the  region  between  C3  —  0  and  L  —  0,  and  the  policy  A  is  used 
in  the  region  below  L  —  0. 

Proof:  By  the  results  already  established  we  know  that  the  only  policies 
which  can  be  used  in  the  region  below  the  line  C3  —  0  are  A  and  C.  Since 
both  of  these  policies  increase  the  slope  exponentially,  eventually  the 
point  (x,  y)  will  reach  the  line  C3  =  0  where  the  mixed  policy  BC  is  em¬ 
ployed. 

Let  us  investigate  the  possibilities  of  changes  from  A  to  C  and  from 
C  to  A.  By  (13.1)  we  have 

A'/  ( t )  —  AY  (£)  =  p  [C,  <p3  +  Ct(f3  +  C2  <f  i  —  C3  <p2] 

and  hence  when  only  C  or  A  is  used, 

(3)  A','  (t)  —  AY  ( t )  =  pCt  9  l  9YI 

which  is  positive  above  C2  —  0  and  negative  below.  Now  in  a  changeover 
from  C  to  A  we  must  have  A','  —  AY  0.  Consequently,  a  change  from 
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C  to  A  cannot  occur  below  Ct  =  0.  Similarly,  we  observe  that  a  change 
from  A  to  C  cannot  occur  above  Ct  =  0.  Also  there  cannot  be  a  change 
from  A  to  BC  because  when  A  is  used  above  C,  =  0,  A',  —  K ,  is  positive 
and  increases;  hence  BC,  which  requires  A',  >  Kx,  cannot  be  used.  Thus 
the  assumption  that  A  can  be  used  above  C,  =  0  leads  to  a  contradiction, 
since,  as  we  know,  BC  must  be  used  eventually. 

We  also  can  prove  that  a  change  from  A  to  C  cannot  occur  on  the  line 
C|  =  0.  For  suppose  that  such  a  change  occurred.  At  this  time  of  change 
we  would  have  Kx  =  Kt.  The  C-policy  will  then  take  the  point  ( x ,  y) 
above  the  line  Ct  —  0  where  K —  K%'  >  0,  hence  Kx  >  Kt,  which 
means  that  A  must  be  used,  a  contradiction. 

There  are  now  twp  possible  cases; 

(1)  C  is  used  in  the  entire  region  below  C,  =  0. 

(2)  There  is  a  line  L  =  0  lying  between  the  x-axis  and  C,  =  0  such  that 
A  is  used  below  L  =  0  and  C  is  used  above. 

The  following  proof  by  contradiction  shows  that  the  first  case  does  not 
occur.  Let  (x0,  y0)  be  a  point  below  C,  =  0.  By  assumption  C  and  BC 
are  the  only  policies  used  so  that  we  must  have  Kf  (t)  =  0  for  all  t  ;>  0. 
Since  K%  (oo)  =  0,  we  have  Kt  (0)  =  0.  Because  C  is  preferable  at  (xe, 
y#).  we  must  have  0  =  Kt  (0)  ^  Kx  (0).  Hence,  since  Kx  (oo)  =  0,  we 
have  by  (13.1) 

(4)  0  <:  Kx  (oo)  -  Kx  (0)  -  JJ'  P  (0  C,  it  +  J~  P  (0  [Cx  <pt  +  C,  yt]  it 

where  t'  is  the  time  of  changeover  from  C  to  BC.  Keeping  x0  fixed,  let 
y0  -*■  0.  This  entails  t’  -*  oo.  Since  C,  <pt  +  C,  is  uniformly  bounded, 
the  second  integral  tends  to  zero.  We  have  then,  using  the  expressions  for 
x,  y,  p,  obtained  from  a  C-policy 

(&)  lim  f  r-**'  [?i  rty0e~r>i  —  (qx  rx  —  qx  r,)  x0e-r.»]  it  ^  0 
or 

/■«  (q.  r .  —  q.  r.) 

(G)  —  j  {qlrx  —  qxrl)x0c’to‘  +  rJ'dt~—  ^  0, 

which  contradicts  the  assumption  that  the  line  C,  =»  0  passes  through 
the  positive  quadrant. 

This  completes  the  consideration  of  the  case  /)  >  0  when  both  C,  = 
0  and  C,  =  0  arc  contained  in  the  positive  quadrant.  The  complete  result 
is 

Theorem  2.  If  I)  =  qx  rt  rt  +  qt  rx  r4  —  </s  r,  rt  >  0,  the  solution  to  the 
problem  of  maximizing  f  (oo)  subject  to  (12.1)  is  given  schematically  by  Fig.  7. 
It  does  not  seem  possible  to  specify  L  in  any  simple  way. 
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Finally,  let  us  discuss  the  degenerate  cases  in  which  C,  =  0  or  Ct  =  0 
do  not  lie  in  the  positive  quadrant.  If  C,  =  0  lies  outside,  the  C-region 
extends  all  the  way  to  the  y-axis. 

§  16.  D  <  0 

Let  us  now  consider  the  case  in  which  D  <  0.  In  this  case  it  turns  out 
that  C  is  never  used,  which  means  that  the  solution  is  as  given  in  the 
two-choice  problem 

Lemma  11.  B  is  used  near  the  y-axis. 

Proof:  Precisely  as  before. 

Lemma  12.  The  lower  boundary  of  the  B-region  adjoining  the  y-axis  is 
C,  =  0.  On  that  line  AB  is  used.  Below  the  line  B  is  not  used. 

Proof  :  As  in  the  case  D  >  0  we  conclude  that  a  A-policy  must  be  follow¬ 
ed  by  one  of  the  mixed  policies  AB  or  BC.  However,  in  the  present  case 
where  1)  <  0,  the  mixed  policy  BC  cannot  be  used  in  an  optimal  strategy. 
For  when  BC  is  used,  we  have 

(I)  AY(0  =  +  C,  993]  <  0 

because  C,  =  0  is  below  C,  =  0  and  C,  =  0.  Also  A',  (00)  =  A',  (00) 
=  A'j  (00)  =  0,  and  A','  ( t )  —  Kf  (t)  —  0  when  the  mixed  policy  BC  is 
used.  Hence  A',  ( t )  >  A,  (t)  —  I<3  ( t )  when  the  AC-mix  is  used.  This', 
however,  is  a  contradiction  since  it  implies  that  95,  =  1,  (f2  =  q3  =  0. 
Hence,  a  A-policy  must  be  followed  by  use  of  AB  on  Ct  —  0. 

Again  the  same  argument  as  above  shows  that  B  is  not  used  below 
=  0. 

Lemma  12.  A  is  used  in  the  entire  region  between  Ci  —  0  and  the  x-axis. 

Proof:  First,  C  is  not  used  just  before  the  /lA-mixture.  While  AB  is  em¬ 
ployed,  A  (t)  —  A'j'  (0  0,  and  AY  (0  —  P  [  —  C,  9:,  -f  C3  <pt\  >  0,  as 
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can  be  seen  from  Fig.  7.  It  follows  that  K%  <  Kt  and  K ,  <  Kx  immedia¬ 
tely  before  the  changeover  to  AB  occurs.  Hence  C  is  not  used  immediately 
before  AB. 

It  follows  then  that  there  is  a  Region  below  C,  =  0  and  adjoining  this 
line,  where  A  is  used.  However,  it  is  impossible  to  use  another  choice 
before  A  is  an  optimal  policy.  When  A  is  used  below  C,.  we  have 

(2)  Kx'  (0  =  0,  Kf  (/)  =  —  pC  i  >  0,  AY  W  =  -  pCt  >  0 

Hence,  Kx  is  the  largest  for  all  smaller  t,  and  the  A  -region  extends  to  the 
x-axis. 

Collecting  the  above  results,  we  have 

Theorem  8.  If  D  =  qx  rt  rt  +  qt  r,  r4  —  qt  r,  rt  <  0,  the  solution  to  the 
Problem  of  maximizing  /(oo)  never  uses  a  C-policy  and  has  the  two-choice 
form:  w 

Ci  =  O 


A 

X 

Figure  8 

§  17.  The  case  r3  =  rt 

Some  of  the  preceding  arguments  fail  in  this  case  because  the  C-policy 
keeps  the  slope  y/x  constant.  It  follows  from  (14.3b)  and  (14.3c)  that 
neither  of  the  mixed  policies  ^4C  or  BC  is  ever  used. 

Let  us  first  of  all  show  that  if  D  <  0,  C  is  never  used.  To  do  this  we 
compare  the  result  of  using  AB  repeatedly  with  that  obtained  from  using 
C. 

When  AB  is  used  continually,  an  easy  calculation  yields 

(1)  fAB  (oo)  =  —J—  (Xo  +  yo) 

r  +  s 


where 
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Similarly  the  result  of  using  C  continually  is 

(3)  fc  (oo)  =  ~rJ—  (Xo  -f  y,) 

The  inequality  Jab  (oo)  >  fc  (oo)  is  equivalent  to  D  <  0. 

If  D  >  0,  the  above  argument  proves  that  no  mixed  policies  are  pur¬ 
sued.  Different  cases  arise  depending  upon  which  of  the  lines  Ct  =  0,  C, 
=  0  pass  through  the  positive  quadrant.  As  before,  it  can  be  established 
that  if  C,  =  0  is  the  positive  quadrant,  it  is  better  to  use  B  rather  than  C 
near  the  y-axis.  Let  us  now  determine  where  the  changeover  from  B  to  C 
can  be  made.  Let  t0  be  the  time  of  changeover.  For  ta  <  t  <  oo,  we 
have 

(4)  AY  (0  =  -  C„  A','  (0  -  -  pCa,  As'  (/)  =  0 

Also,  we  must  have  A,  (t„)  <1  A*  (t0)  —  A,  (<<,)•  Using  again  the  remark 
that  A,  (oo)  =  A,  (oo)  =  A,  (oo).  we  see  that  for  t  ;>  t0,  we  must  have 
C%  =  0.  Thus,  B  is  followed  until  the  line  C,  =  0  is  encountered  and  then 
C  is  followed.  In  this  degenerate  case  C  plays  the  role  of  BC.  Similarly, 
changeover  from  A  to  C  occurs  when  Ct  —  0  is  reached.  If  C3  does  not  lie 
within  the  positive  quadrant,  C  is  used  up  to  the  y-axis.  If  Ct  —  0  does 
not  lie  within,  C  is  used  up  to  the  x-axis. 

§  18.  Nonlinear  utility — two-choice  problem 

Let  us  now  consider  briefly  the  two-choice  problem  discussed  in  §  6—10 
under  the  condition  that  we  wish  to  maximize  the  expected  value  of  some 
function  u  of  the  total  return  R. 

In  view  of  the  results  obtained  for  the  discrete  problem,  or  rather  of 
the  lack  of  results,  it  is  somewhat  surprising  to  find  that  for  every  utility 
function  w,  which  is  strictly  increasing  and  has  a  continuous  derivative, 
the  optimal  policy  is  precisely  the  same  as  that  for  the  linear  utility 
problem  solved  above.  This  alone  should  be  sufficient  to  warn  the  un¬ 
wary  that  continuous  versions  should  not  be  used  without  close  atten¬ 
tion  to  the  kind  of  approximation  they  afford. 

Since  any  monotone-increasing  utility  function  can  be  approximated 
arbitrarily  closely  by  a  function  of  the  above  type,  it  follows  that  this 
policy  is  optimal  for  any  monotone-increasing  utility  function,  although 
not  necessarily  unique.  A  function  of  this  class  of  great  theoretical  and 
practical  importance  is 

(1)  m  (A)  =  0  for  0  <;  R  <  R0 

=  1  for  R  ;>  R„ 
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The  expected  value  of  u  ( R )  is  the  probability  that  R  is  greater  than  or 
equal  to  R0. 

Let  the  variables  have  their  previous  connotations;  we  obtain  as 
before 

(2)  dxfdt  =  —  <pt  (/)  rx  x(t) ,  x  (0)  =  xa 

dyjdt  =  —:tpt  (0  rt  y  (t)  ,  y  (0)  =  y„ 

dpjdl  =  —/>(<)  [«p,  (/)  qx  +  <pt  (t)  ?t] ,  p  (0)  =  1 


Let  z(t)  =  x0  +  y0  —  x  [t)  —  y  (/),  the  quantity  which  represents  the  total 
amount  of  gold  mined  up  to  t  if  the  machine  has  survived  up  to  this  time. 
The  expected  value  of  u  (R)  is  given  by  the  integral 

(3)  G  =  —  u  (z  (0)  dp  (t) 


This  is  easiest  seen  by  considering  that  we  are  paid  for  the  total  amount 
of  gold  that  the  machine  has  mined  up  to  the  time  that  the  machine  is 
damaged. 

Our  aim  is  to  find  the  functions  (px  ( t ),  <pt  (/)  subject  to  the  constraints 
(4)  0  ^  <pt  ^  1,  <p,  +  <pt  —  1 

which  maximize  G. 

pursuing  the  same  perturbation  techniques  as  above,  we  obtain  after 
some  straightforward  calculation 


(5) 

G  — G  =  e  J"  [A.  (0  pt  (t)  +  A,  (t)  ^  ( t )  }dt  +  o  (e) 

where 

(6) 

>* 

-cc 

II 

’< 

Jg  IP'  (S)  «'  (Z  00)  'l  *  (S) 

—  />' (s)  u  (z  (s))]  <fs 

A*  =  qtP  (0  «  (z  (0)  — 

J(  IP'  (s)  «'  (2  (*))  y  00 

—  />' (s)  u  (z  (s))]  is 

Furthermore, 


(7)  AY  (0  —  AY  (t)  =  p  (t)  u'  (z  (0)  [?,  r,  y  (t)  —qt'iX  (<)] 

It  follows  that  if  we  assume  that  u'  (z)  >  0  when  z  >  0,  the  arguments 
and  results  of  the  linear  case  carry  over  with  very  slight  modifications. 
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A  New  Formalism  in  the  Calculus  of  Variations 

§  1.  Introduction 

In  two  previous  chapters,  in  our  treatment  of  multi-stage  production 
processes,  we  encountered  the  problem  of  maximizing  the  functional 
(x(T),  a )  over  all  functions  z  (/)  subject  to  the  relations 

(1)  a.  dxjdt  --  Az  ,  x  (0)  =  c  , 
h.  Bz  <,  Cx  , 

c.  z  >  0  . 

Utilizing  the  fact  that  the  maximum,  which  we  assume  is  attained,  is  a 
function  only  of  the  initial  vector  c  and  the  duration  of  the  process  T ,  we 
obtained  a  functional  equation  for  f(c,  T)  =  Max  (,v  (7'),  a),  which  we 

Z 

converted  into  a  partial  differential  equation.  As  we  mentioned  at  the 
end  of  Chapter  7,  this  same  approach  is  equally  available  for  the  study  of 
other  classes  of  problems  in  the  calculus  of  variations. 

We  shall  pursue  the  investigation  in  this  chapter,  devoting  our  atten¬ 
tion  to  two  particular  classes  of  problems.  The  first  is  that  of  determining 
the  maximum  or  minimum  of  functionals  of  the  form 

r  t 

(2)  J  (r)  =  F  (at,  . . .  z,,  Zj,  .  .  .,  zm)  dt , 

Jo 

subject  to  relations  and  constraints  of  the  form 

(3)  a.  dxi  dt  —  (it  (x,  z)  ,  xt  (0)  —  c<  ,  t  =  1,2,  .  .  . ,  ;i  , 

b.  Rk  (x,  z)  <  0,  k  =  1,  2 . 1. 

The  second  is  the  eigenvalue  problem  associated  with  the  equation 

(4)  u"  -i-  ?.2  (p  (t)  u  —  0 ,  u  (0)  =  u  (1)  =  0 . 

Since,  this  problem  is,  under  reasonable  assumptions  concerning  cp  (t), 
equivalent  to  the  problem  of  determining  the  relative  minima  of 
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(6)  a.  J  <p  (t)  u *  dt  =  1  , 

b.  u  (0)  =  m  (1)  =  0 , 

we  have  a  problem  closely  related  to  that  described  in  equations  (2)  and 
(3).  The  two-point  boundary  condition,  however,  introduces  features  of 
novelty  and  difficulty. 

Following  our  usual  approach,  we  shall  introduce  suitable  state  vari¬ 
ables  and  derive  a  functional  equation  for  the  minimum  of  /  (w)  as  a 
function  of  these  variables.  The  limiting  form  of  this  functional  equation 
will  be  a  partial  differential  equation. 

We  shall  then  turn  to  a  discussion  of  the  numerical  solution  of  these 
equations.  After  indicating  the  conventional  solution  by  means  of  partial 
difference  equations,  we  shall  show  how  difference  equations  can  enter 
along  another  route.  The  importance  of  this  alternate  approach  lies  in  the 
fact  that  it  enables  us  to  bypass  a  number  of  thorny,  analytic  difficulties 
native  to  the  domain  of  the  calculus  of  variations.  It  also  enables  us  to 
avoid  a  number  of  difficulties  associated  with  the  stability  of  computa¬ 
tional  techniques. 

Using  this  approach,  we  shall  consider  also  some  problems  involving  a 
Cebycev  functional 

J  (z)  —  Max  F  ( xlt  xt,  .  2 . .  zm) 

o  <  i  <  r 

In  any  case,  we  shall  throughout  the  chapter  consistently  adopt  a 
purely  formal  viewpoint.  In  this  introductory,  expository  account  we 
are  primarily  interested  in  presenting  the  basic  principles  of  the  func¬ 
tional  equation  method.  \  rigorous  account,  necessarily  of  a  higher  level 
of  difficulty,  will  be  reserved  for  the  second  volume. 

§  2.  A  new  approach 

Before  embarking  upon  the  high  seas  of  analysis,  let  us  discuss  the 
basic  idea  of  this  new  approach  to  continuous  variational  problems. 

The  classic  technique  in  the  calculus  of  variations,  patterned  directly 
upon  the  finite  dimensional  techniques  of  calculus,  depends  upon  the 
concept  of  a  function  yielding  an  extremum  as  a  point  in  function  space, 
and  the  characterization  of  this  point  by  means  of  variational  properties. 

We  shall  instead  consider  the  calculus  of  variations  as  consisting  of  a 
particular  class  of  multi-stage  decision  processes  of  continuous  type.  A 
function  yielding  an  extremum  may  then  be  considered  to  be  a  contin¬ 
uous  policy. 
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Let  us  give  some  simple  examples  which  may  serve  to  illustrate  this 
idea  more  clearly  than  any  abstract  discussion. 

Example  1.  Determine  the  curve  connecting  two  points,  P  and  Q,  having 
the  property  that  a  particle  travelling  along  the  curve  under  the  influence  of 
gravity  will  go  from  P  to  Q  in  minimum  time. 


{the  classical  brachistochrone  problem) 

It  is  clear  that  along  an  extremal,  whatever  the  path  between  P  and 
some  intermediate  point  R,  the  path  between  R  and  Q  must  be  such  as  to 
minimize  the  time  required  to  traverse  RQ,  given  the  left-hand  velocity 
at  R. 

At  each  point  on  the  curve,  we  determine  a  direction  of  motion,  which 
is  to  say  a  tangent  to  the  curve.  The  optimal  policy  or  extremal  may  be 
expressed  not  only  by  means  of  an  equation  for  y  in  terms  of  x,  the 
usual  approach,  but  also  by  means  of  an  equation  for  dyjdx  in  terms  of  y 
and  the  given  left-hand  velocity’  at  (x,  y). 

Example  2.  Suppose  that  we  are  presented  pith  the  problem  of  drawing  a 
curve  passing  through  P  and  Q,  as  in  the  figure  below,  of  fixed  length  L, 
which  will  include  a  maximum  area  in  the  curvilinear  quadrilateral  bounded 
by  the  curve,  the  perpendiculars  PP' ,  QQ' ,  and  the  segment  P'Q'  of  the 
x-axis. 

It  is  clear  that  along  an  extremal,  whatever  the  path  between  P  and  R, 
and  whatever  the  shaded  area  obtained  in  this  way,  the  continuation 
from  R  to  Q  must  maximize  the  area  RR'  Q'Q  subject  to  the  restriction 
that  the  curve  RQ  have  length  L  —  L' . 

The  optimal  policy'  may'  be  expressed  by’  means  of  an  ecpiation  for  dyjdx 
in  terms  of  y  and  L  —  L' ,  rather  than  by  an  ecpiation  for  y  in  terms  of  x 

Both  of  the  conclusions  in  these  two  examples  are  applications  of  the 
“principle  of  optimality”  discussed  in  Chapter  .3,  and  applied  in  all  of  the 
preceding  chapters.  The  mathematical  expression  of  this  principle  will 
y'icid  our  new  approach  to  the  calculus  of  variations. 


247 


A  NEW  FORMALISM 


Figure  2  (the  classical  isoperimetric  problem) 


An  advantage  of  this  new  approach  lies  in  the  fact  that  very  often  in 
the  determination  of  optimal  policies  for  multistage  processes,  the  deter¬ 
mination  of  the  next  move  in  terms  of  the  current  state  of  the  process  is 
in  many  ways  a  simpler,  more  natural  and  even  more  important  piece 
of  information  than  the  determination  of  the  entire  sequence  of  moves  in 
an  optimal  policy  to  be  followed  from  some  fixed  initial  position. 

Speaking  in  geometrical  terms,  we  seek  to  determine  the  intrinsic 
equations  of  extremal  curves.  In  place  of  considering  the  curve  as  the 
locus  of  points,  w’e  regard  it  as  the  envelope  of  tangents,  a  dual  approach 
to  the  classical  treatment.1 

In  general,  as  is  always  to  be  expected,  the  combination  of  the  two 
approaches,  local  and  global,  will  be  most  powerful,  since  some  aspects  of 
an  extremal  are  most  simply  described  in  point  coordinates,  and  others  in 
tangential  coordinates. 

We  shall  in  the  following  sections  apply  these  ideas  to  a  number  of 
representative  problems,  and  discuss  the  application  of  this  approach  to 
the  computation  of  solutions. 

§  3.  Max  r  F  (x,y)  di 

//  Jo 

In  Chapter  l  we  considered  the  discrete  process  which  gave  rise  to  the 
functional  equation 

(!)  /(*)  =  Max  [g  (y)  +  h  (*  —  y)  +  f  (ay  +  b  (x  —  >'))], /(0)  =  0  . 
>><!/<* 


1  Ill  the  terminology  of  game  theory,  there  may  be  a  considerable  advantage 
to  viewing  a  process  in  its  extensive  rather  than  normal  form.  Kssentially,  onlv  then 
do  wc  take  full  advantage  of  the  intrinsic  structure  of  a  process  and  thus  differentiate 
it  from  other  multi-stage  processes  and  other  multi-dimensional  maximization 
problems. 
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A  continuous  version  of  this  process  gives  rise  to  the  problem  of  maxim¬ 
izing  the  functional 

(2)  J(y)  =  J“fe(y)  +h(x  —  y)]dt. 

with  respect  to  y  (t),  where 

(3)  a.  dx/dt  =  —  ay  —  b  (x  —  y),  a,  b  >  0,  x  (0)  —  c  , 
b.  0^y(t)<,x(t).t  ^0. 

Let  us  then,  to  introduce  our  method,  consider  as  our  first  example,  the 
problem  of  maximizing  an  integral  of  the  form 

(4)  /  (y)  =  Jo  F  (*.  y)  dt , 

subject  to  the  relation  between  x  and  y  , 

(5)  dx/dt  -  G  {x,  y),  x  (0)  =  c  . 

To  begin  with,  let  us  omit  any  constiaint  such  as  (3b). 

Let  us  once  again  repeat  that  we  shall  proceed  formally  since  we  are 
interested  here  only  in  presenting  the  mechanics  of  our  approach.  This  is 
to  say,  we  shall  consistently  assume  that  maxima  and  minima  exist,  and 
that  the  extremals  possess  the  requisite  differentiability  properties  we 
shall  need.  The  problem  of  establishing  these  properties  rigorously  is 
quite  distinct  from  that  of  deriving  the  formalism  and  will  not  be  con¬ 
sidered  here.  Furthermore,  as  we  shall  indicate  below,  in  a  number  of 
cases,  we  can  pursue  a  path  which  eliminates  any  necessity  for  obtaining 
a  priori  results  concerning  the  nature  of  the  maximizing  y. 

Returning  to  the  maximization  problem  posed  above,  we  observe  that 
the  maximum  value  of  J  (y)  will  be  a  function  only  of  the  initial  value  of 
x,  namely  c.  Let  us  therefore  write 

(6)  Max  J  (y)  =  /  (c)  . 

v 

and  proceed  to  derive  a  functional  equation  for  /  (c). 

Let  y  =  y  (t)  be  a  function  yielding  the  maximum  of  J  (y).  We  have 
then 

(7)  /(c)  =  f  F(x,y)dt+  f  F  (x,  y)  dt , 

Jo  Js 

for  any  S  >  0. 

Consider  the  second  integral.  The  effect  of  any  initial  choice  of  y(t), 
for  t  in  the  interval  [0,  5],  will  be,  by  way  of  the  differential  equation  of 
(5),  to  convert  c  into  the  value  of  x  at  S,  which  we  call  c  (S).  It  follows 
then,  that  whatever  the  initial  choice  of  y  over  [0,  S],  we  will  have  over 
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the  remaining  interval,  [5,  oo],  a  problem  of  precisely  the  same  form  as 
the  original,  with  the  difference  that  c  is  now  c  (S)  =  x(S).  Since  the 
integrand  is  independent  of  (,  and  also  the  differential  equation,  the  new 
interval  may  be  considered  to  be  [0,  oo],  with  x  (0)  =  c  (S). 

It  follows  then,  invoking  the  principle  of  optimality,  that  equation  (7) 
may  be  rewritten 

{»)  /(c)  =  j*  F(x,y)dt+f(c(S)). 

Since  the  choice  of  the  function  y  must  be  made  so  as  to  yield  the  maxi¬ 
mum  value /(c),  we  obtain  the  basic  functional  equation 

(9)  /(c)  =  Max  [  P* F(x,y)dt  +/(c(S))], 

y  (O,  .S']  Jo 

for  any  S  >  0. 

From  this  equation  we  shall  derive  a  differential  equation  for /(c)  by 
letting  S  approach  0.  For  small  S  we  have,  under  appropriate  assump¬ 
tions  of  continuity, 

(10)  /(c)  =  Max  [F  (c,  y  (0))  S  +  j(c  +  SG  (c,  y  (0))  +  o  (S)] . 

y  1<>.  s\ 

As  the  interval  [0,  5]  shrinks  to  zero,  a  choice  of  y  over  [0,  S]  becomes 
ultimately  a  choice  of  y  (0).  Let  us,  for  notational  simplicity,  set  v  =-■  y  (0). 
Then  (10)  leads  to 

(11)  /(c)—  Max  [F(c,  v)  S  +  /(c)  -f-  SG  (c,  v)  j  ’(c)]  -f  o  (S)  , 

r 

which  in  the  limit  as  S  — >  0  yields 

(12)  0  =  Max  [F  (c,  v)  +  G  (c,  v)  f  (c)] . 

r 

Applying  calculus  to  determine  this  maximum,  we  obtain  the  two 
equations 

(13)  0  =  F  (c,v)  +  G  (c.v) /' (c) , 

0  =  F„  (c,  v)  +  Gv  (c,  i') /'  ( c )  . 


>  Elimination  of  /'  ( c )  between  these  two  equations  yields  the  determi- 
nantal  equation 


(14) 


F  (c,  v)  G  (c,  r) 
F„  (c,  v)  Gv  (c,  v) 


which  determines  v  as  a  function  of  c. 

Having  determined  v  as  a  function  of  c,  which  is  to  say,  y  as  a  function 
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of  x,  we  return  to  the  differential  equation  of  (5)  and  find  x,  and  subse¬ 
quently  y,  as  functions  of  t  by  solving  the  differential  equation 

(15)  dx/dt  =  G  [x,  y  (*)],  x  (0)  =  c  . 

From  this  we  sec  that  relatively  simple  policy,  y  =  <p  (x),  may  yield  a 
relatively  complicated  extremal  function,  x  =  x  ( t ). 

§  4.  Discussion 

Let  us  take  G  (x,  y)  to  be  uniformly  negative  and  equal  to  —  A  (x,  y)  so 
that  we  may  consider  the  above  to  represent  a  continuous  allocation 
process  where  the  rate  of  return  is  F  ( x ,  y)  and  the  rate  of  expenditure  of 
resources  is  A  ( x ,  y).  Starting  from  the  basic  equation 

(1)  0  =  Max  [F(c,  v) — A  (c,  v)  f  (c)j , 

V 

we  have,  for  all  v, 

(2)  0  ^  F  (c,  v)  —  A  (c,  v)  /'  (c)  , 
and  thus 

(3)  /'  (c)  ;>  F  (c,  v)IA  (c.  v) . 

Since  there  is  equality  for  at  least  one  value  of  v,  we  obtain  the  equation 

F  (c,  v) 

<4>  ™ 

This  equation  tells  us  that  the  policy  which  maximizes  the  overall 
return  proceeds  locally  to  maximize  the  ratio  of  the  rate  of  return  to  the 
rate  of  expenditure  of  resources,  a  policy  we  have  encountered  before, 
cf.  Exer.  18  of  Chapter  1,  §  8  of  Chapter  2. 

This  is  a  very  interesting  interpretation  of  the  Euler  equation  for  varia¬ 
tional  problems  of  the  above  simple  form.  We  leave  it  to  the  reader  to 
verify  that  (14)  of  §  3  is  a  first  integral  of  the  Euler  equation  obtained  in 
the  classical  manner. 

§  5.  The  two  dimensional  case 

We  leave  as  an  exercise  the  proof  of  the  result  that  the  same  technique 
applied  to  the  problem  of  determining  the  maximum  of 

(1)  J  F  (.v,,  xt,  y,,  yt)  dt, 

over  all  functions  y,  (/)  and  y,  (t)  subject  to 

(2)  dxtl<U  G  (*„  xt,  y„  yt)  ,  x,  (0)  =  c, , 

dxtjdt  =  //  (v„  xt,  y„  y,)  ,  x,  (0)  =  ct , 
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yields  the  dcterminantal  equation 

F  (ci,  c„  u,  v )  G  (c„  c„  u,  v)  //  (c„  ct,  u,  v) 

(3)  Fu  Gu  Hu  =0, 

Fv  Gv  Hv 

connecting  the  values  y,  (0)  =  it  (c,,  ct),  y,  (0)  =  v  (c,,  c,) . 

It  is  an  open  problem  as  to  whether  or  not  a  solution  to  the  above 
variational  problem  can  be  obtained  in  the  same  form  as  in  the  one¬ 
dimensional  case,  i.c.,  in  the  form  yt  =  <pt  (%,,  *,),  y,  =  tpt  (at,,  *,). 

§6.  Max  [  F(x,y)dt. 

y  Jo 

Let  us  now  consider  the  more  general  problem  of  determining  the 
maximum  of 

(1)  /  (y)  =  Jo  F(x,  y)  dt 

subject  to  the  relation  connecting  x  and  y , 

(2)  dx/dt  =  G  (x,  y) ,  x  (0)  =  c . 

As  we  shall  point  out  again  below,  there  are  certain  advantages  to 
considering  the  finite  problem,  despite  the  complication  caused  by  an. 
additional  parameter. 

The  two  state  variables  are  now  c  and  T.  In  many  applications,  c 
represents  the  initial  quantity  of  resources  and  T  the  duration  of  the 
process.  We  now  write 

(3)  Max  /  (y)  =  /  (c,  T)  . 

y 

Employing  precisely  the  same  reasoning  as  in  the  previous  section,  we 
obtain  the  functional  equation 

(4)  /  (c,  T)  =  Max  [  I**  F  (x,  y)  dt+f(c(S),T  —  S) ]  , 

y  (ii,  .si  Jo 

which  leads,  in  the  limit  as  S  ->  0,  to  the  nonlinear  partial  differential 
equation 

(5)  0  =  Max  [/*'  (c,  d)+G  (c,  v)  fc  — fr] . 

r 

This,  in  turn,  leads  to  the  simultaneous  equations 

(6)  fr  —  F  ( c ,  v)  -f-  G  (c,  v)  /<• 

0  —  (c,  v)  4-  G„  (c,  v)  fc . 
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Solving  for  fc  and  /r  we  have 

(7)  /r  =  —  Fv  ( C ,  v)/Gv  (c,  l)  =  P  (c,  V ) 
fr  =  F  —  GFp/Gv  =  Q  (c,  v) 

To  obtain  an  equation  for  v,  the  fundamental  variable,  we  equate  fac 
with  feT  and  obtain  the  equation 

(8)  Pv  Vt  —  Qv  IV  "I”  Qc  . 

This  is  a  first  order  linear  partial  differential  equation  for  v  =  v  (c,  T), 
which  may  be  solved  by  means  of  the  method  of  characteristics,  a  point 
we  shall  mention  again  below  in  §  14,  or  by  numerical  means,  given  v 
(0,  T)  or  v  ( c ,  0). 

It  is  here  the  advantage  of  a  T-dependent  formulation  becomes  clear. 
Wc  can  determine  v  as  a  function  of  c  for  T  =  0  quite  readily,  since  for 
small  T,  we  have 

(9)  /  (c,  7')  =  Max  [F  (c,  v  )T  +  o  (T)] . 

V 

Consequently,  for  T  —  0,  v  =  v  (c,  0)  is  determined  by  the  condition 
that  it  maximizes  F  (c,  v). 

§  7.  Max  f  F(x,y)dt  under  the  Constraint  0  <  y  <  x 

y  J  o 

Let  us  now  consider  the  problem  of  determining  the  maximum  of 
rr 

J  (y)  =  F  (x,  y)  dt  subject  to  the  relations 

(1)  (a)  dxjdl  =  G  (x,  y),  x  (0)  =  c  , 

(b)  0  ^  y  <.  x  . 

As  far  as  the  classical  approach  is  concerned,  the  difficulty  of  the 
problem  resides  in  the  fact  that  y  cannot  be  determined,  in  general,  by 
means  of  an  unrestricted  variation.  When  0  <  y  <  x,  we  may  vary 
freely,  and  in  intervals  where  this  inequality  holds,  y  must  satisfy  the 
Euler  equation.  However,  when  y  =  0  or  .v,  we  merely  have  an  Euler 
inequality.  The  heart  of  the  problem  lies  in  determining  how  to  fit 
together  the  three  types  of  solution,  y  =  0,  y  =  x,  and  v  a  solution  of  the 
Euler  equation.  This  is  equivalent  to  determining  the  transition  points 
where  two  types  of  solution  join. 

At  the  present  time,  there  exists  no  uniform  technique  for  solving 
these  problems  in  explicit  analytic  form.  Certain  classes  of  problems  of  this 
type  do  have  a  simple  structure  to  their  solution,  as  wc  shall  briefly 
discuss  below. 
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Let  us  now  see  how  the  functional  equation  technique  applies  to  this 
problem.  Define 

(2)  f(c.  T)  -  Max  /  (y) 

¥ 

As  above,  we  derive  the  partial  differential  equation 

(3)  JT  =  Max  [F  (c,  v)  +  G  (c,  v)  /,] . 

OSiSf 

The  original  constraint  0  ^  y  <£  x  has  been  translated  into  the  con¬ 
straint  0  ^  v  <;  c.  The  initial  condition  is 

(4)  f(c,  0)-0. 
for  all  c. 

We  see  that  the  constraint  0  <;  v  <£  c  prevents  us  from  differentiating 
freely  with  respect  to  v.  In  §  10,  we  shall  show  how  (3)  can  be  used  to 
derive  the  structure  of  the  solution,  under  certain  assumptions  concerning 
F  and  G. 


§  8.  Computational  aolution 

Let  us  examine  the  nonlinear  partial  differential  equation 

(1)  fr  =  Max  [F  (c,  v)  +G(c,  v)/e], 

V 

with /  (c,  0)  =  0  and  sketch  a  procedure  that  may  be  used  to  compute  the 
solution. 

In  place  of  allowing  the  variables  T  and  c  continuous  variation,  we 
restrict  their  range  to  the  set  of  values 

(2)  T  =  Q,A,2A . kA,  ... 

c  K  o.  |  d,  (  'lf>,  k  f),  . . . 

where  A  and  f)  are  both  positive  quantities. 

The  partial  derivatives  fr  and  fr  are  now  approximated  to  by  the 
difference  quotients 

...  •  /(r.r-M)  f(c.T) 

(3)  Jr  A 

f[c-\-6,T)  f  (c  —  d,  T) 


With  the  result  that  the  nonlinear  di'T''’-  d  equation  in  (1)  assumes  the 
approximate  form 
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(4) 


f{c.T  +  A)  =/(c,  T) 

+  A  [Max  (f  ( c ,  v )  +  G  ( c ,  v) 


/(£  -f  <5,  T)~/(c-6,  T) 
2d 


)] 


/(c,  0)  =  0 

Starting  with  the  known  values  for  /(c,  0)  we  can  compute  successively 
the  values  of  / (c,  A),f  (c,  2 A),. . . ,  and  so  on. 

Although  this  method  is  conceptually  very  simple,  there  are  great 
difficulties  encountered  in  actual  computing  practice.  Essentially  the 
main  question  is  how  to  choose  the  quantities  A  and  d.  The  convergence 
oi  the  process  and  the  stability  of  the  numerical  solution  depend  upon 
the  proper  choice  ofthesc  parameters.  For  the  linear  equations  that  appear 
when  the  maximum  is  removed,  there  is  a  fairly  complete  and  satis¬ 
fying  theory  of  these  matters.  For  nonlinear  equations,  however,  prac¬ 
tically  no  theory  exists  and  the  matter  rests  in  the  realm  of  art  and 
experience.* 

It  is  interesting  to  observe  that  the  numerical  solution  of  (7.3),  an 
equation  with  a  constraint,  is  easier  to  obtain  than  the  numerical  solution 
of  (1)  above,  due  to  the  fact  that  the  existence  of  the  constraint  narrows 
the  range  that  must  be  examined  to  determine  the  maximum.  Conse¬ 
quently,  in  many  cases,  the  more  realistic  process  will  possess  a  simpler 
computational  solution. 

In  §  11  we  shall  discuss  an  alternate  computational  scheme,  also  based 
upon  difference  equations,  which  in  practice  seems  to  be  more  efficient 
and  which  enables  us  to  proceed  in  a  rigorous  fashion,  without  having  to 
enter  difficult  domains  of  the  calculus  of  variations. 


§  9.  Discussion 

We  have  mentioned  above  the  difficulties  that  may  arise  in  solving  a 
variational  problem  subject  to  restraints,  and  also  the  fact  that  certain 
cases  may  be  completely  i  solved. 

Let  us  show  how  the  functional  equation  in  (8.1)  may  be  used  to  yield 
information  concerning  the  structure  of  the  solution.  We  shall  consider 
only  the  case  where  F  (c,  v)  is  strictly  concave  in  v  for  all  r,  and  G  ( c ,  v) 
is  linear  in  v.  The  nonlinear  partial  differential  equation  then  ha?  the 
form 

(1)  /r  Max  [/•'  (c,  v)  -f  (g  (c)  +  h  (c)  v)  fc] 

0  *"  i*  <  r 


*  There  is  also  the  problem  of  choosing  a  suitable  difference-quotient  approxim¬ 
ation.  In  (3),  \ve  choose  a  symmetric  approximation  for  fc  and  an  asymmetric 
one  for  Jt  For  the  case  of  linear  equations,  stability  considerations  may  often 
be  helpful.  For  nonlinear  equations,  practically  nothing  is  known. 
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The  function  F  (c,  v)  +  (g  (c)  4-  h  (c)  v )  fr  is  strictly  concave  in  v  for  all 
values  of  c  and  T,  and  the  maximum  over  v  is  uniquely  assumed.  It  may, 
however,  occur  at  v  =  0,  v  =  c  or  at  an  interior  point. 

Assuming  that  all  the  functions  involved  vary  continuously  with  c  and 
T ,  we  can  make  the  following  important  observation.  Since  the  function 
F  (c,  v )  -f  (g  (c)  -f  h  (c)  v)  fc  varies  continuously  and  is  strictly  concave, 
the  maximum  cannot  shift  from  v  =  0  to  v  =■■  c  without  passing  through 
interior  points  of  the  interval  [0,  c]  first. 

This  is  a  particular  case  of  the  fact  that  the  maximum  value  for  v 
depends  continuously  upon  c  and  T.  This  remark  can  be  used  to  shorten 
greatly  the  time  involved  in  the  computational  solution  of  these  proces¬ 
ses,  and  furthermore,  it  makes  feasible  the  numerical  solution  of  multi¬ 
dimensional  processes.5 

It  follows  that  any  extremal  must  have  the  following  structure.  An 
interval  where  y  =  0  must  be  followed  and  preceded  by  an  interval  in 
which  0  <  y  <  x,  and  similarly  for  an  interval  where  y  —  x. 

The  question  arises  as  to  how  often  the  solution  can  switch  from  one 
type  to  another.  In  order  to  answer  this,  we  must  make  further  as¬ 
sumptions  concerning  the  functions  which  appear.  It  is  not  difficult  to 
construct  examples  showing  that  there  may  be  an  arbitrarily  large  num¬ 
ber  of  such  transitions  if  F  is  chosen  suitably.  In  the  example  considered 
in  the  next  section  we  will  carry  through  the  discussion  in  greater  detail. 

§  10.  An  example 

Let  us  consider  the  problem  of  determining  the  maximum  of 

(1)  J(y)  =  (T(x-y)dt 

J  o 

under  the  conditions 

(2)  a.  dxjdt  =  b  (y),  x  (0)  =  c 

b.  0  ^  y  <;  x 

The  basic  equation  is 

(3)  ■  fr  =  Max  [c  —  v  -f  b  (v)  fr)  . 

<•  <  r  <,  r 

Let  us  now  assume  that  b  (y)  satisfies  the  conditions 

(4)  a.  b  (0)  =  0  ,  b'  (0)  —  oo 

b.  b'  (y)  >  0  ,  b'  (y)  ->  0  as  y  — >-  oo 

c.  b"  (y)  <  0  , 

3  <  (.  the  rcmai ks  in  §  22  and  §  23  of  Chapter  1 
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A  simple  function  satisfying  these  conditions  is  y'/*. 

Let  us  assume,  as  is  quite  plausible,  that  fe  >  0*.  Then,  turning  to  the 
determination  of  the  maximum  of  K  (v)  —  c  —  v  -f  b  (v)  /«.,  we  see  that 
the  derivative  with  respect  to  v,  K'  (v)  =  —  1  -f-  b’  (r)  fe,  is  positive  for 
small  v,  negative  for  large  v  and  zero  for  just  one  value  of  v.  Let  us  further¬ 
more  assume,  as  is  also  plausible  in  this  case,  that  fc  —  0  at  T  =  0  and 
monotone  increasing  thereafter  as  a  function  of  T. 

'  If  we  allow  v  to  traverse  the  interval  0  <;  v  <  oo,  we  see  that  there 
will  always  be  a  solution  of  K'  (i>)  =  0.  However,  if  v  is  constrained  by  the 
condition  that  0  v  <,  c,  then  if  fc  is  large,  which  is  to  say,  if  T  is  large, 
K'  ( v )  will  remain  positive  throughout  the  interval  0  v  <,  c.  This 
means  that  the  maximum  will  be  at  v  —  c,  or  y  =  x,  for  T  large  compared 
to  c. 

It  remains  to  determine  the  transition  curve  T  =  T  (c)  at  which  this 
cross-over  in  policy  occurs.  We  know  that  the  solution  will  have  the  form 

(5)  a.  y  —  x, 

b.  0<y<x,  tx  <t  <T , 

The  first  part  of  the  curve,  where  y  —  x,  will  appear  only  if  T  is  suffi¬ 
ciently  large.  If  T  is  small,  the  solution  willconsist  only  if  the  second  part, 
where  0  <  y  <  x. 

Consider  then  the  case  where  T  is  small.  There  are  two  courses  we  may 
pursue.  We  may  first  use  the  fact  that  the  maximum  in  (3)  occurs  inside 
the  interval,  which  means  that  (3)  is  equivalent  to  the  two  equations 

(6)  /r  =  C  —  ii  +  i  ( v )  fc 

0  =  —  1  +  b'(v)fc 

These  equations,  combined  with  the  boundary  values 

(7)  /  (c,  0)  =  0,  v  (c,  0)  =  0**  '-*«*  . .  ^  > 

suffice  to  determine /(c,  T ),  for  T  small. 

Alternatively,  we  may  use  the  classical  variational  technique,  armed 
with  the  knowledge  that  we  can  ignore  the  constraint  0  <  y  <  x.  Setting 

(8)  J  (y)  =  (T  [c  +  f  b  (y)  ds  —  y]  dt , 

JO  JO 

we  readily  obtain  as  the  variational  equation,  the  Euler  equation 

(9)  (T  —  t)  b'  (y)  —  1  =  0 . 

With  y  determined  uniquely  by  this  equation,  we  can  compute  /  (y)  for 
the  extremal  and  thus /(r,  T). 

4  In  the  following  section,  we  shall  show  how  these  results  may  be  derived  by 
a  consideration  of  the  discrete  process. 
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As  T  increases,  the  critical  value  of  T ,  as  a  function  of  c,  is  furnished 
by  the  value  for  which  the  equation 

(10)  —l  -f  &»/e  =  0 

has  the  solution  v  —  c,  which  is  to  say  the  value  of  T  furnished  by  the 
equation 

<H>  ^  r>  =  *4 

If  fe  ( c ,  T)  is  monotone  increasing  as  a  function  of  T,  as  surmised,  this 
equation  has  one  root  T  ( c ).  Once  we  have  determined  this  critical  value 
the  solution  is  completely  determined. 

§  11.  A  discrete  version 

One  of  the  methods  we  can  employ  to  make  the  above  arguments 
rigorous  is  based  upon  the  discrete  approximation  to  the  continuous 
problem.4  Considering  the  problem  above  in  §  10,  a  discrete  version  is  the 
problem  of  determining  the  maximum  of 

(1)  J  (y)  =  J  (y°.  >v  y*.  z  (x*  —  yt) 

k  -  0 

over  all  y<  subject  to  the  relations 

(2)  a.  **  + 1  =  -v*.  +  b  (>’*)  , 

b.  0  <,yt<,Xk,  k  =  0,  1,  2,  . . N , 

If  we  set 

(3)  us  (c)  =  Max  J  (y)  , 

y 

we  obtain  the  recurrence  relations 

(4)  a.  u„  (c)  =  c  , 

b.  us  +  i  (c)  =  Max  [c  —  v  -j-  uy  (c  +  b  (e))J  ,  JV  =  0,  1,  ...  . 

0  <  r  <  r 

Using  the  same  methods  we  have  employed  in  §  12  of  Chapter  1,  and  in 
our  discussion  of  the  optimal  inventory  equation,  it  is  easy  to  establish 
the  following  result : 

Theorem  1.  For  each  N  >  1,  there  exists  a  function  vy  (c)  with  the  follow¬ 
ing  properties: 

i  They  may  also  be  rigorously  established  using  classical  techniques.  A  reference 
will  be  found  in  the  bibliography  at  the  end  of  the  chapter. 
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(5)  a.  t>.v  (c)  is  monotone  decreasing  as  c  increases, 

b.  r.v  + 1  (c)  >  vs  (c) ,  A'  =  1 ,  2,  . . . 

c.  There  is  a  unique  solution  to  vs  (c)  =  c  which  we  call 
cy ;  c\  + 1  >  cn  . 

d.  For  0  <;  c  <;  cs,  uy  (c)  =  us  _  j  (c  -f-  b  (c)),  i.e.  v  =  c. 

e.  For  cN  <L  c,  un  (c)  =  c  —  vs  ( c )  +%-i[c  +  4  (v\  (c))] , 

f.  uy'  (c)  ;>  us'  -  i  (c),  N  =  1 ,  2 . for  c  0. 


The  proof,  which  is  inductive  along  the  usual  lines,  we  leave  to  the 
reader. 

A  similar  result  can  be  obtained  for  the  more  general  case,  correspon¬ 
ding  to  the  problem  in  §  7,  if  we  impose  suitable  conditions  on  F  (x,  y) 
and  G  ( x ,  y).  The  proof  is  much  more  detailed. 

As  we  saw  in  §  7 — 8,  the  problem  of  determining  the  maximum  of  /  (y) 

CT 

F  (x,  y)  dt  subject  to  the  relations 


=/: 
(6) 


(a)  dxjdt  =  G  (x,  y),  x  (0)  =  c 

(b)  0  <;  y  <  x 


can  be  reduced  to  the  problem  of  solving  the  nonlinear  partial  differen¬ 
tial  equation 

(7)  fT  =-  Max  [F  (c,  v)  +  G  (c,  v)/c],f(c,  0)  =  0 

O  <  r  <  r 

This  equation  may  be  approached  numerically  by  converting  it  into  a 
partial  difference  equation. 

In  order  to  use  this  method  with  confidence,  we  must  first  establish  the 
fact  that  the  variational  problem  is  equivalent  to  solving  this  nonlinear 
equation,  a  matter  of  some  difficulty  when  constraints  are  imposed,  and 
then  that  the  finite  difference  method  yields  an  approximate  solution  to  the 
nonlinear  equation,  again  a  complicated  question.  Both  of  these  problems 
may  be  avoided  in  the  following  way.  We  replace  the  original  problem  by 
the  problem  of  determining  the  maximum  over  y*-  of  the  function 

(8)  F  ({y*})  =  A  2J  F  (xk,  y,)  , 

t  -  0 

subject  to  the  relations 

(9)  (a)  x*  +  i  =  xk  +  AG  (**,  y*-),  x0  —  c 

(b)  0  <;  yk  <,  xk,  k  =  0,  1 ,  2,  ....  .V  , 


where  .r*  =  x  (k  A),  yk  =  y  (k  A)  ,  N  A  =  T  . 
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Setting 

(10)  fN  (c)  =  Max  J  ({y*})  , 

y 

we  replace  the  above  maximization  problem  by  the  recurrence  relations 

(11)  /.  (c)  =  0. 

fs  +i  (c)  =  Max  [A  F  ( c,.v )  +  fs  (c  -f  A  G  (c,  r))] . 

«  <  r  <  r 

In  cases  treated  to  date  this  has  turned  out  to  be  a  more  reliable 
computational  procedure,  and  it  possesses  a  number  of  other  attractive 
features  from  the  numerical  point  of  view  as  well. 

It  turns  out  to  be  not  too  difficult  to  show  that 

(12)  lim  fs  (c)  =  f(c,  T) , 

i  --  o 

under  conditions  upon  F  and  G  that  are  normally  assumed  in  the  calculus 
of  variations.  Actually,  these  conditions  can  be  greatly  lightened.  How¬ 
ever,  any  discussion  of  this  would  take  us  loo  far  afield. 

§  12.  A  convergence  proof 

Since  a  discussion  of  the  convergence  question,  even  under  strong 
assumptions,  becomes  quite  long-winded  in  its  full  generality,  without 
adding  much  in  principle,  we  shall  content  ourselves  with  the  proof  of  a 
typical  result. 

Let  us  set 

(1)  /{c,  T)  =  Max  fr  F  (x,  y)  dt , 

„  Jo 

subject  to  the  constraints 

(2)  a.  dxfdl  =  G  ( x ,  y)  ,  x  (0)  =  c  , 

b.  0  <;  y  <;  x  . 

It  is  convenient  to  set  y  =  tp  x*  so  that  we  have,  introducing  a  new  F 
and  G, 

(3)  /(c,  T)  =  Max  r  F  (x,  <p)  dt , 

y  Jo 

*  This  is  particularly  so  in  the  numerical  calculation  since  this  change  of  inde¬ 
pendent  variable  permits  the  maximization  to  be  over  a  fixed  region,  0  <  <f  k  <  1, 
rather  than  over  a  variable  region.  On  the  other  hand,  there  are  cases  where  the 
variable  region  is  desirable,  particularly  in  connection  with  shrinking  processes. 
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where 

(4)  a.  dx/dt  =  G  (x,  <p)  ,  x  (0)  =  c  , 

b.  0  <;  <jf  <;  1  for  all  t  ;>  0 . 

Let  us  now  define,  for  n  =  1,2,  ....  a  sequence  of  approximating 
problems:  Maximize 

( j)  /.V  ({9?*},  n)  =  £  F  (Xt,  <pk)/n,  N  =  [7 '/«] . 

k  -  I) 

wlicre  x*-  and  9^*  are  related  by  the  equations 
(<•)  xt  +  1  —  xt  —  G  (xt,  (f  t)jn  ,  A  —  0,  1 ,  . . . ,  n  —  1, 

x0  =  c  , 

and  the  variables  <ft  are  constrained  by  the  relations 

(7)  0  ^  9-1-  <1  1  ,  A  =  0,  1,  . . .,  N  . 

Here,  as  above,  a*  =--  x  (A  A),  9 r>*  —  9 r  (k  A) . 

For  each  c  and  7',  let 

(>s)  f(c,  T,  n)  =  Max  /.v  ({y*}.  a)  ■ 

We  wish  to  show  that 

<!')  lim  f(c,  T,  n)  =  f(c,T)  . 

#l  *  00 

We  first  require  the  following 

T  Lemma  :  LelG( a,  9)  .satisfy  a  Lipschitz  condition  for  m  ■<,  x  <L  XI , 
0^9"^  1.  Get 

(10)  a.  qr  (t)  be  a  step-function  with  constant  value  9?*-,  0  <  9 1-  <  1,  in 
the  interval  k/n  <  t  <  (k  -+  1 )/«,  A1  =  0,  1,  . . .,  Ar; 

b.  {.v*}  be  defined  recursively  by  (6),  /«*/  the  uniform  bounds  on 

the  sequence  be  m  and  M ;  m  <  a*-  <;  -1/. 

c.  x  (t)  be  slcpfunctionic’ithconstantvalue  xtfor  k/n  <,  t  <  (A  +  l)/», 

d.  a  ( t )  fce  defined  as  the  solution  of  the  differential  equation  in  (4). 

I  hen  there  exists  a  constant  A  depending  only  upon  G  and  T  such  that 
x  (/)  —  a  (t)  |  <;  kjii,  for  0^<<  A'. 

This  may  be  proved  by  the  Cauchv-Lipschitz  method,  applied  in  the 
same  way  as  in  the  proof  of  the  existence  theorem  for  systems  of  ordinary 
differential  equations. 
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Let  us  now  state  the  limit  relation  of  (5))  as 
Theorem  2.  Under  the  assumptions 

(11)  a .  F  and  G  have  continuous  second  partial  derivatives. 

b.  There  exist  constants  p,  q,  r  such  that  px  <,G  (x,  y)  qx  +  r. 
for  x  >  0  and  0  <£  y  <;  x 

c.  Gy  is  of  one  sign:  either  Gy  >  0  or  Gy  <0  for  all  x  >  0,  and 

O^y^x 

we  have 

(12)  lim  f(c,  T,  n)  =  f(c,  T) , 

h  oo 

for  all  c  0,  T  >  0. 

Proof:  Given  c  >  0  and  T  >  0,  let  N  —  [ T/n ],  as  above.  The  condition 
of  (lib)  enables  us  to  assert  the  uniform  boundedness  of  x  (t)  lor 
0  <,t<LT.  Let  tn  <Z.  x  (t)  M ,  and  thus  m  <.  x*  <L  M.  Since,  by  as¬ 
sumption,  F  (x,  <p).  and  G  ( x ,  y>)  satisfy  Lipschitz  conditions  in  the  region 
m  <;  x  <;  M,  0  tp  <;  1,  by  virtue  of  the  Lemma  above,  there  exists  a 
constant  Bx  dependent  only  on  c,  T,  F  and  G,  such  that 

(13)  i  /  (?)  ~  JN  ({?*}•  n)  I  ^  B'ln  • 

whenever  y>  ( t )  and  <py  are  as  in  Lemma  1  above.  It  follows  that 

(14)  f(c,  T,  n)  <>f  (c,  T)  +  B'/n  , 

for  all  n  —  1,  2 . 

Let  {«(}  be  a  subsequence  of  {n}  for  which  lim  f(c,  T,  tn)  =  lim 

(  *  *  00  H-*  t* 

/  (c,  T,  n).  Given  e  >  0,  let  cp  (t)  be  chosen  so  that 
(1*0  f(c,  T)  <J(t)  f  '  • 

Now  y  { t )  is  the  limit  almost  everywhere  of  a  sequence  (yjm  ( t )}  of  step- 
functions  for  which  0  ^  (pm  (/)  <L  I,  and  we  have  lim  J  ((pm)  -■  /  (y>). 

i:t  •  c»* 

Hence  we  may  take  the  y>  appearing  in  ( 1  f»)  to  be  a  step-function,  and 
actually  a  step-function  constant  in  each  interval  of  the  form  kjn  t  < 
(A-  4  I )/>»,  for  some  arbitrarily  large  n  >u.  From  (13)  we  have 

(1<>)  f(c,  '/')  <  Js  ({'/  *■}.  »)  +  B'jn  -h  r  -  f(c,  T,  n)  B'/n  4  e . 

Hence 

(17)  /(c.  7  )  <;  lim  f(c,  r,  »/)  I  lull  inf/(c,  T.  n)  I  r  . 
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On  the  other  hand,  using  (14)  we  sec  that 

(18)  iim  sup/(c,  T,  n)  <.f(c,  T) . 

w-*  oo 

* 

Since  e  is  arbitrary,  we  see  that  (12)  holds. 

If  {(fkn}  maximizes  Js({fk},  «),  then  {9?*..}  determines  for  each  n 
=  1,2,  . . .,  a  step  function  <pn  (l)  with  the  property  that 

(19)  lim  J(<p„)  =f(c.  T). 

H  ►  00 

If  there  is  a  convergent  subsequence  which  converges  almost  everywhere 
to  a  limit  9?  (/),  then  lim  J  ( <pn )  —  J  ((f),  and  <p  ( t )  is  a  maximizing  func- 

M  — ♦  OO 

tion. 

If  this  function  possesses  suitable  monotonicity  properties  we  can 
employ  Helly's  theorem  to  obtain  a  convergent  sub-sequence.  Otherwise, 
we  must  use  weak  convergence  arguments  or  analogous  techniques. 

fT 

§  13.  Max  F(x,y,I)dl 

y  0 

So  far  we  have  considered  time  independent  processes — those  where 
I'  and  G  are  independent  of  t.  Let  11s  now  treat  the  more  general  case, 
that  of  maximizing 

( 1 )  J  (v)  =  JT  F  (.v,  y,  t)  dt , 
subject  to  the  relation 

(2)  dx/dt  G  (.v,  y,  t)  ,  .v  (0)  c . 

In  order  to  apply  the  functional  equation  technique,  we  imbed  this 
problem  within  the  wider  problem  of  determining  the  maximum  of 

(3)  J  (v)  I  7  F  (v,  y,  t)  dt , 
subject  to  the  constraint 

(4)  dx/dl  (1  (.v,  y,  t)  ,  x  (a)  —  c . 

Here  a  ranges  over  the  interval  L0,  7  ], 

Keeping  /  fixed,  the  two  state  variables  are  now  a  and  c,  and  we  may 
write 

(5)  Max  J  (y)  —  /(a,  c)  . 
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The  functional  equation  for  /is 

(6)  /(  .  c)  =  Max  [  f  +  *  F  (x,  y,  t)dt+f(a  +  S,  c  (S))] , 

for  0  <  S  <  F  —  a. 

Letting  S  ->  0,  we  obtain  the  equation 

(7)  0  =  Max  [F  (c,  v,  a)  +  fa  +  G  ( c ,  v,  a)fc ] , 

r 

where  v  —  i>  (a,  c)  is  the  value  of  y  (a). 

From  (7)  we  obtain  the  two  equations 

(8)  0  =  F  (c,  v,  a)  +  fa  +  G  ( c ,  v,  a)  fc 

0  =  Fv  (c,  v,  a)  +  Gv  ( c ,  v,  a)  fc 

Solving  for  /„  and  /<-,  we  obtain 

(0)  fc  =  —  Fv/Gv  =  P  [c  r ,  j) 

fa  =  (FGv  —  Fv  G)^v  ^  Q  (c,  V,  a) . 

As  above,  equating  the  values  of  /c„  and  fac,  we  obtain  the  first  order 
partial  differential  equation  for  v, 

(10)  P v  I’d  +  Pa  —  Qv  Vc  4-  Qc  • 

Those  who  are  familiar  with  quasi-linear  partial  differential  equations 
of  this  type  will  readily  verify  that  the  characteristics  of  this  equation 
are  equivalent  to  the  Euler  equations  obtained  by  classical  variational 
techniques. 

§  14.  Generalization  and  discussion  \ 

If  we  now  consider  the  problem  of  determining  the  maximum  of  the 
functional 

(1)  J(y)  =  F(x(t),y(t))dt, 

subject  to  relations 

(2)  Jxjdt  =  g  (x,  y),  x  (0)  =  c  , 

where  x,  y,  c  and  g  are  ^-dimensional  column  vectors,  and  F  is  a  scalar 
function  of  x  and  y,1  we  can  proceed  in  a  similar  fashion.  Setting 

(3)  f(c,  T)  =  Max  /  (y) , 

v 

1  Any  explicit  dependence  upon  t  can  always  be  removed  by  consideration  of  t 
as  a  dependent  variable  xn  +  defined  by  dxn  *  ,  d/  =  1  xn  +  ,  (0)  =  0 
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the  principle  of  optimality  yields  the  functional  equation 

(4)  /(C.S+  T)  =  Max  [  f  *  F  (x,  y)  dt  4- f  (c  (S),  T)] 

*|0,.S'l  jo 

The  classical  transversality  conditions  fall  out  as  a  special  case  in  this 
equation,  as  might  be  expected  on  the  basis  of  the  duality  between  point 
and  tangential  coordinates  which  we  have  indicated  above. 

Carrying  through  the  calculations  similar  to  those  in  (8)  and  (9)  in  §  1 1 
above,  we  obtain  a  system  of  quasi-linear  partial  differential  equations 
for  the  vector  v  =  v  (c,  T)  —  y  (0). 

This  equation  has  a  characteristic  theory,  and,  as  is  to  be  expected, 
the  characteristics  are  equivalent  to  the  Euler  equations  of  the  varia¬ 
tional  problem.  The  rigorous  proof  is  quite  complicated  and  will  not  be 
presented  here. 


§  15.  Integral  constraints 

We  considered  above  in  §  7  a  variational  problem  where  y  was  con¬ 
strained  by  the  condition  0  <;  y  <;  x.  Let  us  now  discuss  the  problem  for 
the  case  where  we  impose  the  additional  constraint 

(1)  jTydt<.m. 

The  minimum  of  J  F  ( x ,  y)  dt  will  now  be  a  function  of  the  three  state 

variables  c,  T  and  m.  Denote  it  by  f(c,  T,  m ).  Using  the  above  methods, 
we  see  that  /  satisfies  the  equation 

(2)  fr  =  Max  [F  (c,  v)  +  G  (c,  v)  fe  —  vfm] . 

0  <  r  <  f 


Problems  involving  constraints  of  the  type  encountered  in  the  pre¬ 
ceding  sections  arise  in  the  consideration  of  many  physical  problems  if 
we  impose  realistic  bounds  on  such  quantities  as  velocity,  acceleration, 
radius  of  curvature,  rate  of  allocation  of  resources,  and  so  forth.  Integral 
constraints,  such  as  that  appearing  above,  or  a  constraint  of  the  form 

T 

y,J  dt  <;  m,  appear  if  we  assume  that  resources  are  bounded,  that  the 


I. 


kinetic  energy  is  bounded,  and  so  on. 

Generally  speaking,  integral  constraints  are  more  readily  handled  than 
point  constraints.  Although,  theoretically,  the  Lagrange  multiplier 
method  is  capable  of  treating  both  types  of  constraints,  as  well  as  more 
general  classes,  in  practice  we  encounter  the  difficulty  discussed  above  of 
determining  when  the  variable  is  within  the  domain  of  variability,  and 
when  it  is  on  the  boundary. 
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§  16.  Further  remarks  concerning  numerical  solution 

Let  us  consider  the  problem  of  determining  the  maximum  of  ll.v  in 
tegral 

(1)  /(*)  -  JoF  F(x,x',t)dt. 

where  x  (0)  =  c,  but  x  is  otherwise  unrestrained.  Assuming  that  F  •'ati . 
fies  appropriate  conditions,  the  solution  is  determined  by  the  l£u 
equation 

(2)  dF/dx  —  d/dtdFIdx'  =  0. 

This  is  a  second  order  equation,  of  the  forni 

(3)  *'  =  G  (x,  x',  t) , 


which  means  that  two  boundary  conditions  are  necessary  to  determine 
the  solution.  One  condition  is  furnished  by  the  original  constraint  x  (0) 
=  c,  while  the  other,  arising  from  the  variational  procedure  is 


(4) 


As  we  see,  one  condition  is  at  t  =  0,  and  the  other  at  t  =  T.  On  the 
other  hand,  in  order  to  integrate  (3)  in  a  convenient  fashion,  either  with 
a  digital  computer  or  an  analog  computer,  we  require  the  values  of  x  and 
x'  at  t  0  or  at  /  =  T.  Unfortunately,  we  do  not  obtain  either  of  these 
sets  of  conditions  from  the  above  analysis. 

We  are  thus  confronted  by  the  classical  difficulty  of  a  two-point  bounda¬ 
ry  condition.  If  G  is  linear  in  x  and  x',  we  face  no  particular  difficulty;  If, 
however,  as  is  generally  true,  G  is  nonlinear,  we  must  face  the  (act  that 
there  is  no  systematic  technique  for  determining  the  solution  of  (3), 
satisfying  (4)  and  the  initial  condition. 

The  usual  procedure  is  to  start  the  integration  at  /  =  0,  beginning  with 
a  range  of  values  of  x'  (0),  and  narrowing  the  range  until  (4)  is  sufficiently 
well  approximated.  This  is  a  time-consuming  procedure,  sometimes  com¬ 
plicated  by  stability  problems,  which  becomes  rapidly  more  inefficient  as 
the  dimension  of  the  variational  problem  incieases. 

We  have  assumed  that  F  is  a  function  possessing  a  sufficiently  smooth 
behavior  to  justify  the  use  of  (2).  If  we  allow  F  to  possess  terms  such  as 
\x  —  a  [  or  Max  (x  —  a,  x'  —  b,  g  (/)),  functions  which  arise  very  na¬ 
turally  in  economic  and  engineering  processes,  the  application  of  the 
usual  variational  approach  becomes  increasingly  difficult. 

Combine  the  above  complications  with  those  furnished  by  the  exist 
ence  of  constraints,  and  we  see  that  conventional  methods  must  be  sup- 
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w  to  resolve  a  variety  of  problems  arising  in  a  very 
->u\  .  tin  the  physical  world. 

l  et  us  iif*.  anally,  that  the  remarks  we  made  concerning  the  need 
fui  .1  snisit ivit \  >r  stability  analysis,  in  Chapter  7  and  in  connection  with 
discrete  dei  ision  recesses,  are,  of  course,  equally  valid  in  the  context  of 
continuous  docisic  processes. 

17.  El^envali  problem 

I  et  us  now  devote  »  *  attention  to  the  problems  of  determining  the 
values  of  A  which  permi;  .  non-trival  solution  of  the  equation 

(1)  u  X'<p(t)u  =  0, 

U  (<»,  M  (1)  —  0  , 

to  exist. 

The  connect  ion  between  oiu  picvc  .s  -<»rk  and  this  problem,  which  at 
first  sight  seems  fat  removed,  arises  imiii  tb  act  that  under  light  condi- 
♦ions  on  </  (/•.  the  eigenvalui  problem  is  < « ,  .lent  to  the  problem  of 

iM 

determining  ...  *••  •  .  i:  m a  of  I  u'*  subject  to  the  constraints 

(2)  j'J (0  „(())  =  «(1)  =  0,^  *‘ 

or,  conversely,  to  that  of  determining  the  relative  maxima  of  <p  (t)-u*  dt 
subject  to  the  constraints 

(3)  j'u’*dt  =  l,  u  (0)  =  k  (1)  =  0 . 

What  makes  this  problem  different  in  quality  from  those  we  have 
considered  above  is  the  fact  that  as  we  traverse  an  extremal  the  condition 
u  (0)  =  0  is  violated.  Consequently,  we  must  imbed  this  problem  in  a 
more  general  class  of  problems  possessing  the  requisite  invariance  pro¬ 
perties  if  we  wish  to  employ  the  functional  equation  approach.  Happily, 
there  are  several  ways  of  doing  this. 

In  the  first  approach,  we  consider  the  minimization  of 

(4)  /  («)  dt , 

over  all  u  satisfying  the  conditions 

(5)  (a)  u  (a)  =  A,  m  (1)  —  0 

(b)  I  <p  ( t )  u*  dt  =  1 
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Here  the  new  state  variable  a  satisfies  the  condition  ()<;«<  1.  We 
assume  that  the  function  rp  ( t )  satisfies  the  constraint  0  <  6,  <,  rp  (t)  bt 
for  0^/^  1,  and  is  continuous  over  [0,  1]. 

An  equivalent  problem  is  that  of  maximizing 

(6)  K  (u)  =  J  rp  (t)  u*  dt , 
subject  to  the  constraints 

(7)  (a)  u  (a)  —  k,  u  (1)  =  0 


A  second  less  obvious  formulation  that  serves  the  purpose  is  the  follow¬ 
ing:  Minimize 

(8)  J  (w)  =  £  u'*  dt 

subject  to  the  constraints 

(9)  (a)  u  (a)  =  0,  «  (1)  =  0, 

(b)  J  [<p  (l)  «*  +  k  (1 — t)  rp  (<)  m]  dt  =  1 . 

§  18.  The  first  formulation 

Let  us  set 

(1)  /(a,  k)  —  Min  J'  »'*  dt , 
subject  to  the  constraints 

(2)  (a)  u(a)  =  k,  m(1)  =0. 

(b)  f  <p  (/)  m*  dt  =  1 


We  write,  along  an  extremal, 

(3)  (a)  f  (p  (t)  u*  dt  —  \  —  scp  (a)  k x , 

J  a  +  • 

(b)  u  (a  -f-  s)  =  k  +  sv  , 

(c)  /(a,  k)  —  v*  s  -f  f  «'*  dt , 

j  a  +  9 

to  terms  in  o  (s).M 

*  In  order  to  simplify  the  analysis,  we  shall  proccd  directly  to  the  derivation 
of  the  limiting  partial  differential  equation. 
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Now  make  the  change  of  variable 

(4)  u  ( t )  =  [1  —  s  9?  (a)  k*/2]  w  ( t ),  a  +  1 , 

in  order  to  maintain  the  condition  (26).  We  then  have 

(5)  (a)  w  (a  +  s)  =  k  +  sv  +  s  <p  (a)  A*/2  , 

(b)  /  (a,  k)  =  v*s  -f  (1  —  s  (p  (a)  k *)  f  w'*  dt 

J  a  +  t 

to  terms  in  o  (s). 

Combining  the  above  results,  we  obtain  the  approximate  functional 
equation 

(6)  /  (a,  k )  =  Min  [r1  s  +  (1  —  s  <p  (a)  k *)  f  (a  +  s,  k  +  sv 

+  s<p  (a)  A,3/2)]  +  c  (s)  . 

Letting  s  — >  0,  the  result  is  the  equation 

cp  (a)  A3 

(7)  0  =  Min  [d*  +  v/t]  +  fa  H - z — /*  —  <p  («)  A*/, 

V  * 

or 

(8)  /-  =  A*/ 4  -  tp  (a)  A3  A/2  +  9  («)  A*  / 

The  initial  condition  is  at  a  =  l„,ir>d  not  trivial,  since /(a,  A)  — »-  oo  as 
a  — ►  1.  There  are  two  ways  to  determine  this  initial  condition,  as  we  shall 
discuss  in  the  next  section. 

§  19.  An  approximate  solution 

If  a  is  close  to  1,  and  tp  (I)  continuous,  as  assumed,  we  may  replace  the 
variational  problem  in  (17.1)  and  (17.2)  by  the  approximate  problem: 


Minimize  J 

u'1  dt,  subject  to  the  constraints 

a 

(1) 

O 

!! 

II 

2 

(b)  J  w2  dt  —  1 

upon  absorbing  the  factor  tp  (1)  into  the  function  u  (<)• 

This  problem  may  be  approached  in  two  ways.  Using  the  classical  ap¬ 
proach,  we  obtain  the  Euler  equation 

(2)  m"  -f  a  u  =  0 , 

which  may  be  resolved  explicitly.  The  unknown  parameter  is  determined 
by  the  constraints  in  (la)  and  (lb). 
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The  second  method  uses  (17.8)  with  <p  (a)  =  1.  Since  the  solution  of  the 
problem  in  (1)  above  is  known  for  k  =  0,  namely 

(3)  /(-•  0)=(^V* 

we  can  obtain  a  solution  to  (17.8)  as  a  power  series  in  k,  for  k  >  0.  Since 
we  are  primarily  interested  in  the.  solution  for  small  k,  this  is  a  useful 
form  of  the  solution  for  numerical  purposes. 

§  20.  Second  formulation 

We  leave  the  derivation  of  the  corresponding  partial  differential  equa¬ 
tion  for  the  variational  problem  defined  by  (16.8)  and  (16.9)  as  an  exercise 
for  the  reader,  with  the  hint  that  the  essential  point  is  to  renormalize 
m  (/)  constantly  so  as  to  maintain  the  initial  condition  u  (a)  =  0. 

§21.  Discrete  approximations 

Since  the  partial  differential  equation  for  the  minimum /  (a,  k)  posses¬ 
ses  certain  unpleasant  features  as  far  as  initial  values  are  concerned,  the 
following  discrete  formulation  may  be  of  value. 

Let  us  consider  the  problem  of  minimizing  the  function 

.v 

(1)  F  (m„  m . .  uN  -  l)  =  -  («t  —  wjt  -  i)2 , 

x-  i 

subject  to  the  constraints 

(2)  (a)  27  <pk  ui?  =  1  , 

/  i 

(b)  »<„  =  a,  m.v  =  0 . 

Corresponding  to  the  use  of  the  state  variable  R,  we  consider  the  se¬ 
quence  {/n  (a)}  defined  as  follows 

,v 

(3)  fit  (a)  =  Min  27  {uk  —  uk  _  x)2 , 

'  N)  *  « 

where  the  uk  are  subjected  to 

.V 

(4)  (a)  2’  9  *•  nkx.  =  1  , 

i  it 

(b)  tin-i  =  a,  m.v  =  0 , 

for  k=  1,2 . N  —  1. 

Since  this  involves  a  variable  range  for  each  quantity  uk,  let  us  make  a 
change  of  variable 

(5)  (f  k  m  k  =  vk , 
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under  the  assumption  that  0  <  bl<Mq>k<,  bt  <  oo  for  k  —  0,  1,  2,  . . N. 
Then 


(6) 

where 


fn  (a)  =  Min  T  — 

{rt}  t  -  1<  \<p* 


vk-  l\* 


(7)  (a)  Z  Vt*  =  1 

i-  -  /< 

(b)  vn  -  l  =  <pn  -  i  a,  vn  =  0 . 


We  leave  as  an  exercise  the  task  of  determining  the  recurrence  relation 
for  the  sequence  {/«  (a)}. 


§  22.  Successive  approximation 
Returning  to  the  equation 

(1)  /(c)  =  Max  [  f  S  F  (x.  y)  dt  +  /  (c  (S))]  , 

obtained  in  §  3,  it  is  tempting  to  envisage  the  use  of  successive  approxi¬ 
mations  for  the  solution  of  the  equation.  If,  however,  we  choose  an 
initial  function  fa  (c),  and  define  a  second  approximation  by  means  of  the 
equation 

(2)  fx  (c)  =  Max  [  f  F  (x.  y)  dt  +  /.  (c  (S))] , 

b  |<i,  .v |  J  o 

we  see  that  in  the  limit,  as  S  ->  0,  wc  must  have  /",  (c)  —  fa  (c),  provided 
that  f0  ( c )  is  continuous. 

At  first  sight,  this  would  seem  to  render  the  use  of  successive  approxi¬ 
mations  impossible.  Actually  this  is  not  so.  What  is  true  is  that  we  must 
approximate  in  policy  space  rather  than  in  function  space.  We  must  con¬ 
centrate  our  attention  primarily  upon  v  =  v  (c,  T)  rather  than  upon 
/  (c,  T).  Nonetheless,  /  (c,  T)  still  plays  an  important  auxiliary  role. 

To  illustrate  this  point,  let  us  discuss  the  problem  of  maximizing 

(3)  J  (y)  =••  F  (x.  y)  dt , 

subject  to  the  relations 

(4)  dx/dt  =  G  (x,  y)  ,  x  (0)  =  c  . 

Then,  as  in  §  f>,  wc  obtain  the  equation 

(5)  }t  =  Max  [/•'  (c,  v)  4-  G  ( c ,  v )  fc ] . 
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Let  us  now  choose  an  initial  approximation  t»„  =  v0  ( c ,  T),  which  is 
equivalent  to  y0  =  y0  ( x ,  T  —  t),  keeping  in  mind  the  connection  between 
physical  time  t,  and  T,  the  remaining  time  for  the  process.  Using  this 
value  of  y0,  we  compute  x0  by  means  of  the  differential  equation, 

(6)  dxo/dt  =  G  ( x0 ,  ya  (x0,  T  —  /)) ,  x0  (0)  =  c  , 
and  then  /„  ( c ,  T)  by  means  of 

(7)  fo  (c,  T)  =  JJ*  F  (x0,  y0)  dt . 

This  function, /«,  satisfies  th"  linear  partial  differential  equation 

(8)  foT  =  F  ( c ,  v„)  +  G  (c,  Vo)  foe  ■ 

To  obtain  the  next  approximation  to  an  extremal  v,  or  an  optimal  v, 
we  determine  vt  (c,  T)  as  a  function  which  maximizes  the  function 

(9)  F  (C,  V)  +  G  ( C ,  V )  foe  . 

Using  (c,  T),  we  obtain  y,  (x,  T  —  t)  and  then  xt  and  fx  as  above. 
Having  obtained  fx  we  compute  as  a  function  which  maximizes 

(10)  F(c,v)+G(c,v)fie. 

and  continue  in  this  way,  deriving  a  sequence  of  approximations  to 
/,  {/«},  and  a  sequence  of  approximations  to  v,  {r,i}. 

§  *28.  Monotone  approximation 

Let  us  now  show  that  this  sequence  of  approximations  to  /is  monotone 
increasing,  a  fact  which  is  important  theoretically  and  computationally. 
We  have 

(1)  fiT  =  F  (c,vx)  +  G  {c,v1)flc. 

foT  —  F  { C ,  Vo)  4  G  (c,  t'0)  foe  <[  F  (c,  V,)  4*  G  ( C ,  Vx)  foe  . 

Hence 

(2)  (/l  — fo)T  G  (C,  l*,)  (/,  fo)e  • 

Since  fx  (c,  0)  =  f0  (c,  0)  =  0,  we  see  that  /,  —  /<>  2>  0  for  all  T  ;>  0. 
Continuing  in  this  way,  we  readily  establish  the  monotone  property 
ot  the  sequence  {fn}-  If  the  sequence  is  uniformly  bounded,  we  have 
convergence.  However,  it  is  essential  to  know  when  the  sequence  of 
partial  derivatives,  {/nr},  {/nr},  and  the  sequence  of  policies  {rn}  also, 
converge.  The  general  question  is  a  difficult  one  and  we  shall  not  enter 
into  it  here. 

It  is  interesting  to  note,  however,  that  we  do  possess  a  systematic 
technique  for  improving  any  particular  policy. 


l 
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§  24.  Uniqueness  of  solution 

As  we  have  noted  above,  we  are  bypassing  any  of  the  rigorous  aspects 
of  the  derivation  of  the  partial  differential  equations  we  have  encoun¬ 
tered  and  any  study  of  the  existence  of  the  solution  of  these  equations. 
It  is,  however,  worth  noting  that  the  uniqueness  of  solution  may  be  es¬ 
tablished  quite  readily  by  means  of  the  same  device  we  have  formalized 
as  Lemma  1  in  Chapter  3. 

Consider,  for  example,  the  equation 

(1)  fr  =  Max  [F  (c,  v)  +  G  (c.  v)Jc] , 

r 

and  assume  that  there  exists  another  solution  of  this  equation,  g  =  g  { c,T ), 
which  possesses  the  same  initial  value,  namely 

(2)  f(c,  0)  =g(c,0)  =  0, 
for  all  c.  Then,  we  have  also 

(3)  gr  =  Max  [F  (c,  w)  +  G  ( c ,  w)  gc] . 

tr 

Let  v  —  v  (c,  T)  be  a  function  which  furnishes  the  maximum  in  (1)  and 
w  =  vr  (c,  T)  a  function  which  furnishes  the  maximum  in  (3).  Then  we 
have  the  inequalities 

(4)  /r  =  F  (c,  v)  +  G  { c ,  v)  fc  ^  F  (c,  iv)  +  G  (c,  w)  fc 
gr  —  F  (c,  w)  -f-  G  ( c .  w)  gc>F  (c,  v )  +  G  (c,  v)  gc . 

These  inequalities  yield 

(5)  fr  —  gT^G  (c,  U’)  {fc  —  gr) 

^  G  (c,  V)  {fc  —  gr)  . 

Thus,  if  we  set  u  =  f  —  g,  we  see  that  u  satisfies  the  inequalities 

(6)  G  (c,  w)  uc  <,  ut  <,G  {c,  v)  uc  . 

Since  the  solutions  of 

(7)  xt  —  G  ( c ,  w)  xc  =  0  ,  x  (c,  0)  =  0  , 

y t  —  G  (c,  v)  yc  =  0  ,  y  {c,  0)  =  0  , 

«  • 

are  identically  zero,  it  follows  from  a  comparison  theorem  that  u  is 
identically  zero. 

§  25.  Minimum  maximum  deviation 

Let  us  now  discuss  the  numerical  solution  of  a  variational  problem 
of  the  following  type : 
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Minimize 

(1)  Max  |  u  —  a  | , 

o  <  t  <  T 

over  all  functions  v  (/)  satisfying  the  constraint  —  1  <;  v  <;  1,  where 

(2)  du/dt  =  g  (u,  v),  u  (0)  =  c, . 

Consider  the  corresponding  discrete  process  where 

(3)  «t  +  i  =  «t  +  ff  v*)  A,  u0  =  c,, 

and  ut  =  u  {kA),  A  =  77  JV,  vk  =  v  {kA). 

Define 

(4)  /v  (c,)  =  Min  Max  |  m*  —  a  | . 

{r*}  o<  i  sA- 

Then 

(5)  f0  (c,)  =  |  c,  —  a  | , 
and 

(6)  f n  + 1  (c,)  =  Max  [  I  c,  —  a  [,  Min  /*  (c,  +  g  (c„  v)  A)] , 

k  I  <  i 

for  2V  =  0,  1,  2 . 

We  have  thus  reduced  the  solution  of  the  origin.ll  variational  problem 
to  a  computation  of  a  sequence  of  functions  of  one  variable  determined 
by  the  foregoing  recurrence  relation. 


Exercises  and  Research  Problems  for  Chapter  IX 


1.  Obtain  functional  equations  for  the  following  quantities 


a. 

Max 
/  ■ 

j‘f*di,/(0)  =c, 

J>*_1 

b. 

Max 
f  ■ 

/*"*./(  0)  =  c. 

(/*)*"  dt  =  1 

c. 

Max 
/  ■ 

JV'./(0)  =  c. 

JV  ^  «,  <V2  dt  <,  h 

2.  Obtain  functional  equations  for  the  following  quantities 

a.  Max  f  fgdt,f{ 0)  =  c,f  monotone  increasing,  f  /*  dt  <L  1 

/Jo  Jo 

b.  Max  f  fgdt,f(0)  =  c,f  monotone  increasing  and  convex  (concave), 

/  Jo 
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3.  Carry  through  the  analysis  suggested  in  §  18  and  obtain  the  first  few 
terms  of  the  expansion  of  /(a,  k)  as  a  power  series  in  k,  for  a  close  to  1. 

4.  Follow  the  same  procedure  for  the  second  formulation  of  the  eigenvalue 
problem. 

5.  Obtain  a  functional  equation  for  the  following  quantity 

Min  fT  [(*  —  c)1  +  kf*]  dt , 

J  Jo 

dxjdt  =  —  ax  -f-  /,  x  (0)  =  c  . 

6.  Obtain  a  corresponding  result  for  the  general  case 

Min  fT  [£  '(*<*>  —  c,-)*  +  /*]  dt , 

J  Jo  l  -  (I 

*<v)  =_:  je<*v-D  -f-  ...  -(-  as  x  +/, 

*<*>  (0)  =  etc,  k  =  0,  1 . iV  —  1 . 

7.  Use  the  functional  equation  approach  to  determine  the  minimum  of 

|  (1  —  x y  dt  over  /  where  0  <;  f  <,  M,  M  >  1 .  dxjdt  =  —  x  +  /, 

x  (0)  ~  1 .  and  J  fdt  <;  a  <  7‘, 

8.  Under  the  same  conditions  determine  the  minimum  of 

JT  (dx/dt)*  dt. 

fT 

9.  Determine  the  minimum  of  I  dt  over  all  /satisfying 

rf.v/rf/  =  —  x  -f  f,  x  ( 0)  =  1,  1  —  a  <,  x  <,  1  4 -  «  for  T . 

10.  Determine  the  minimum  of  (x  —  v)  dt  over  y,  given  that 

a.  dxjdt  —  b  (y),  x  (0)  —  c,  0  <,  y  <.  x  , 

b.  b"  (y)  is  continuous  and  b'  (y)  <  0,  b'  (y)  >  0 

c.  b'  (0  ♦)  =  -f  o° 


11.  Consider  the  same  problem  under  the  assumption  th"t  b'  (0  +)  is 
finite. 
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J'T 

[K  (y)  4-  L(x  —  y)j  dt  over  all  y  where 

a.  K  (0)  =  L  (0)  =  0 

b.  A"  (.r).  L"  (*)  ^  0  for  all  *  , 

c.  dx/dt  —  —  ay  —  b  (x  —  y),  x  (0)  =  e,  b  >  a  >  0 . 

13.  Consider  the  problem  of  minimizing  the  functional 

J  (x)  =  j*  [a,  (t)  (x  —  bt  (<))*  4-  a%  (t)  (*'  —  bt  (/))*]  dt , 

r  t 

0  <1  s  ■<,  T,  over  all  functions  x  such  that  x  (s)  =  c,  and  J  x'1  dt  <  oo. 

Assume  that  all  functions  appearing  are  continuous  and  that  a i  (t)  >  0 
in  the  interval  [0,  T], 

Define 

f(c,  s)  =  Min  J  (x,  s)  . 

Show  that 

/*  =  —«i  («)  (c  —  by  (s))*  +  —fc*l\at  (s)  , 

/(c,  7  )  =  0  for  all  c  . 

14.  Show  that /  (c,  s)  —  u  (s)  +  cv  (s)  4-  c*  it’  (s),  where  m,  r  and  if  depend 
only  upon  s. 

15.  Show  that  u,  v  and  if  satisfy  the  equations 

(a)  «' (s)  =  —  «i  (s)  byl  (s)  +  b2  (s)  r  (s)  —  r*  (s)/4a,  (s)  , 

(b)  v'  (s)  =  2a,  (s)  by  (s)  4-  2 6,  (s)  if  (s)  —  i’  (s)  if  (s)/«,  (s)  , 

(c)  if'  (s)  =  —  a,  (s)  —  if*  (sj'/a,  (sTT 
with  u  ( T )  =  v  ( T )  =  if  (T)  =  0. 

16.  Obtain  the  corresponding  results  for  the  functional 

7  (*)  =  [a,  (0  (x  -  by  (/))•  +  (0  <*'  -  6,  (0)*  4- 

«.  (t)  (*'  -  (<))•] 

17.  Consider  the  following  discrete  analogue  of  the  problem  in  13. 

We  wish  to  minimize  the  function 
.v 

/  (x)  =  2’  [<^K  {xK)  +  y»K  (xK  —  Xk  -  ,)] , 

K  -  1 
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over  all  possible  values  of  the  xx,  K  =  1,2,  . . N,  with  x,  —  x.  As 
usual,  assume  that  tps  (x)  and  v'*  (  )  arc  continuous  functions,  with 
appropriate  properties  at  x  —  oo 

Show  that  this  problem  leadc  to  the  sequence  {/*•  (x)}  defined  as  follows 

fs  (x)  —  Min  [q>N  (x.v)  4  Vs  (x.v  —  x)] , 
rx 

fn  (x)  =  Min  \<p,t  ( x« )  +  y>n  (x«  —  x)  4  /«  +1  (x»)] . 

SR 

18.  Consider,  in  particular,  the  case  where  <px  and  tpn  are  quadratic  in  x, 
with  grx  =  £>x  (x —  <fx)*,  y>x  =  Cx  X*.  Show  that,  in  this  case,  fs  (x) 
=  us  4-  vs  x  4  ws  x1,  where  us,  vs  and  r.-.v  are  independent  of  x. 

19.  Show  that 

M.v  —  i  =  fs-  i  d*s-  i  4  Wat  —  (d\  i  b\-  i  — i>jv/2)*]/[6,v  -  l  4-  es  -  i  4-  Ws] 

—  2 es  -  l  (ds  -  l  bs  -  l —  vs/ 2) 
vs  -  i  =:  — — 

bs  -  i  4  cn  -  x  4  ves 

t's  -  l  (bs  4  it’s) 

U'S  -  1  = 

b\  -  i  4-  es  -  l  4  ws 

20.  Let  {xx}  denote  the  sequence  of  minimizing  xx's.  Show  that 

xex  4  dl  6,  —vj 2  , 
bt  +  e,  4  w. 

xk  -i  4  da  _  i  6x  -  l  —  i’x  +  x/2 

xx  =  - . - , - 

Ox  +  fx  4  wk 


21.  Treat  in  a  corresponding  manner  the  problem  of  minimizing  the 
expression 

.v 

J  (x)  =  Z  aK  (xx  —  6x)’  4  cK  (xk  —  Xx  -  i)‘  4  £x  (sx  —  <*x)*J  , 
x  i 

where  sK  =  xt  4  xt  4  ...  4  xx  . 

22.  Consider  the  stochastic  case  where  the  parameters  appearing  are 
stochastic  variables,  and  it  is  desired  to  minimize  the  expected  value  of 
/  (*)• 


23.  Consider  the  scalar  equation 

du/dl  =  g  («,  v),  u  (0)  =  c  , 

where  v  is  to  be  chosen  so  as  to  minimize  the  functional  f  (v)  = 
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Let  f(c,  T)  —  Min  /  (r),  and  derive  a  functional  equation  for /. 

r 

24.  Consider  the  process  where  we  wish  to  minimize 

J  h  (u  —  «•  (/))  dt . 

25.  Consider  the  problem  where  we  wish  to  minimize 

Max  u  —  a  (/)  | , 

II  <  /  <  7' 

where  a  (/)  is  a  known  function  of  t. 

26.  Consider  a  corresponding  two-dimensional  problem,  using  the 

equation  — 

u"  +  —  1)  h'  +  u  =  /(<)  , 

with  —  1  </(/)<;  1. 

27.  Treat  in  the  same  fashion  the  problem  of  maximizing 

J  (/-  J?)  =  Min  (*.  y  dt , 

where 

dx/dt  =  ax  +  /,  x  (0)  —  c,  , 
dyjdt  =  by  +  g,  y  (0)  =  ct , 

and  the  functions /(/)  and  g(t)  satisfied  the  constraints. 

/+(,’=  1 ,  /.  g  ^  0 . 

28.  Consider  the  equation 

d'ujdt 1  -f-  a*  u  —  f  (/),  n  (0)  =  c,,  u'  (0)  =  c„  <  1. 

We  wish  to  choose  /  ( t )  subject  to  —  1  <;/(/)  <,  1  so  as  to  reduce  u  to 
zero  in  minimum  time.  What  is  the  corresponding  functional  equation  ? 

2fl.  Obtain  the  solutions  of  the  brachistochrone  and  isoperimetric  prob¬ 
lems  using  the  functional  equation  approach. 

30.  Determine  the  path  of  a  ray  of  light  through  an  inhomogeneous 
medium,  assuming  that  the  path  minimizes  time. 

31.  Consider  the  problem  of  determining  the  minimum  of 

Js  (it)  =  Max  Max  i<*  ,  |1  — r*-|  ], 
n  <  <•  <  .v 

over  all  sequences  {a  a}  satisfying  die  conditions  ju'A-j  <;  1,  where 
i/a-  i  =  g  (i<a-,  I’v.  w a),  1/0  =  C|, 

I- A-  i  1  h  (i/a-,  I’A-,  If  a),  Vo  =  Co  . 
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fs  (c  1,  c»)  =  Min  Js  (w) 


Show  that 


fs  +  1  (Ci,  c-i)  =  Max  [  |ci|,  1  —  c«\,  Min  fs  (g  (ci,  e2,  u),  h  ( cx ,  c2,  w)  )  ]  . 

M  <  i 

32.  Obtain  the  corresponding  recurrence  relations  for  the  case  where 

Js  (if)  =  Max  Dk  (uir,  vt). 

ti  <  t  <  *v 

33.  Consider  a  rocket-powered  aircraft  moving  in  level  flight  with  a 
mass  equal  to  the  fixed  mass  of  the  aircraft,  w,  plus  the  mass  of  the 
fuel,  tn.  The  force  of  propulsion  is  taken  to  be  a  known  function  of  the 
rate  of  burning  of  the  fuel,  the  velocity  of  the  aircraft  and  the  mass  of 
the  fuel.  Equivalently  the  force  of  propulsion  is  a  known  function  of 
the  thrust  and  drag,  which  are,  in  turn,  known  functions  of  the  burning 
rate,  the  velocity,  and  the  mass. 

Let 

(1)  x  ( t )  =  the  distance  along  the  .t-axis  fromlhe  origin  at  time  t. 

i1  (/)  —  velocity  of  the  aircraft. 

tn  (t)  =  mass  of  fuel. 

.  w  —  fixed  mass  of  aircraft, 

y  (f)  =  burning  rate  of  fuel. 

F(y,  v,  m)  —  force  of  propulsion. 


(it  +  m)  d  =  F  (y,  r.  m), 


(if  -f  m)  -  —  I-  (v,  v,  tn),  v  (o)  —  i'o, 
d  t 


dm  UK 

-  -  =  — V,  tn  (0)  —  in0. 
dt 

34.  Consider  a  discrete  version  of  the  process  described  above,  and 
impose  a  restriction  on  the  burning  rate,  0  <;  y  ( t )  <,  K. 

Let 

f(v,  tn)  =  the  range  covered  starting  with  initial  velocity  v  and  a 
(juantity  of  fuel  tn,  ending  with  terminal  velocity  vt, 
using  an  optimal  burning  policy. 


270 


A  NEW  FORMALISM 


Show  that 

F  (y,  v,  nt)  A 

f(v,  m)  =  vd  +  Max  [/( v  +  — 

n  <  y  <  it  u’+m 


m  —  vA)] 


and  show  how  the  quantity  wr  enters. 

{R.  Bellman  —  S.  Dreyfus,  H.  Cartaino  —  S.  Dreyfus)* 

35.  Similarly,  let  us  define 


f(v,  m,d)  =  the  time  required  to  traverse  a  distance  d,  starting  from 
initial  velocity  vo  and  a  given  quantity  m  of  fuel,  with  a 
required  terminal  velocity  vt,  using  an  optimal  burning 
policy. 


Show  that 


f(v,  tn,  d)  =  A  -f  Min  [/(v  + 
o  <  v  <  it 


F  ( y ,  v,  nt)  d 

w  +  w 


m  —  yd,  d  —  v d )  ]  . 


36.  Consider  the  equation 


d*x 

dl* 


.  ,  ,  dx 

+  <  'si 


dx 

-f  r  =  r  (t)  +  v  (x,  — ,  t),  x  (0)  =  ci,  a:'  (0)  =  c2, 
a  t 


where  the  function  v  =  v  (x, — , /)  is  to  be  determined,  subject  to  the 

d  t 

constraint  |r|  ^  1,  so  as  to  minimize  the  expected  value  of 

J(x)  =  I*  x*dt  +  \x(T)\, 

over  a  suitable  class  of  random  functions  r  ( t ). 

Going  over  to  the  discrete  version,  show  that  we  obtain  the  recurrence 
relation 

/o  (ci,  co)  —  AcP  -f  |ci  -f-  cod  |  , 


j n  (ci,  c2)  =  Min  LAcp  4  I  f s  -  i  {c\  Cz  A ,  C2 

|  r,  |  <  1  J  -oo 

[ —  (Cl*  —  1 )  c2  —  Cl  +  ro  4-  I’o]  A)  dG  ( r0 )  ]  , 

where  dG  (r0)  is  the  distribution  function  for  the  independent  random 
variables  {r(}. 

37.  Consider  the  linear  equation 

^2  dx 

,  ^  +  x  =  r  (t)  +  v  (x,  X  ,  t),  x  (0)  =  Ci,  x'  (0)  =  a, 
d  P  at 

*  H.  Cartaino  -  S.  Dreyfus,  Application  of  Dynamic  Programming  to  the 
Minimum  Time-to-Climb  Problem,  Aeronautical  Engineering  Review,  1957. 
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where  v  is  to  be  determined  so  as  to  minimize  the  expected  value  of 

J  (*)  =  f*  0*  4-  (dT/dt)*]  dt. 

Find  the  corresponding  recurrence  relations,  determine  the  structure  of 
the  sequence  { fs  (c> ,  c2)  },  and  the  optimal  policy.* 

38.  Returning  to  the  problem  discussed  in  7,  consider  the  problem  of 
determining  v  so  as  to  minimize 

J  —  Prob  {  Max  |x|  ^  a  }  . 

O  <  t  <  T 

Show  that  the  discrete  version  leads  to  recurrence  relation 

► 

r°° 

fn  +  1  (Cl,  c2)  =  Min  fn  (Cl  4-  c2  A.  Ct  4- 

1  <■,  |  <  i  •  - «» 

[  —  (c,2  —  1)  cn  —  ci  4  4  i'o ]A)  dG  (to) 

31).  Consider  the  case  where  the  r<  are  not  independent.  Assume,  to  begin 
with,  that  the  distribution  of  rn  +  i  depends  on  the  value  of  r„.  Define 

fs  (ci,  cz,  r)  =  minimum  expected  value  of  fs,  given  the  initial  state 
(ci.ca),  and  the  information  that  the  value  of  the  random 
variable  at  the  preceding  stage  was  r. 

Show  that  the  recurrence  relation  for  the  sequence  {fs}  is 

/•  co  I 

fs  (ci,  c-i,  r)  —  Min  [Aci2  +  fs  _  i  (ci  4-  c2  A,  c2  4- 

J  -  On 

[—  (Cl2  —  1)  c2  —  Cl  +  r0  4-  I'o]  A)  dG  (r0,  -’)] 


40.  Consider  the  problem  of  determining  a  monotone  decreasing  sequence 
of  approximations  to  the  first  characteristic  value  of  id'  4-  Xq  (t)  u  =  0, 
m  (0)  =  n  (1)  =  0.  Let  9"  (t)  be  a  continuous  positive  function  of  l  in 
0,  1],  so  that  the  first  characteristic  value  is  defined  by 

I  u'2  dt 
A,  =  Min  Ju 

P  T  (‘)  «2  dt 

J  O 

Let  us  approximate  in  policy  space  by  considering  functions  id  ( t ) 
which  are  constant  on  the  intervals  /cl ,  (k  4-  l)d], 
k  =  0,  1 ,  2 . A'  —  I,  XA  =  1,  i.e. 

id  (t)  =  ha.  kA  <  t  <  (k  +  1)  A. 

Let  Ai  (AT)  denote  the  minimum  over  this  space.  Show  that 

A,  (A'i  >  A,  (2  X). 

•  K.  Heilman,  Dynamic  Programming  and  Stochastic  Control  Processes, 
Trans.  I.It.H.,  15)57. 
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and  derive  a  recurrence  method  for  computing  Xi  (N). 

41.  Consider  the  corresponding  problem  for  the  equation 

«(4)  +  hp(t)u  =  0,«  (0)  =  u'  (0)  =  u  (1)  =  «'  (1)  =  0. 
corresponding  to  the  variational  problem  defined  by 


Xi  —  Min 


fo  9  ®  U * 


dt 
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Multi-Stage  Games 


§  1.  Introduction 

In  the  previous  chapters  we  have  discussed  a  number  of  decision  pro¬ 
cesses  which,  although  of  different  origins  and  varying  analytic  structure, 
possess  an  important  feature  in  common — all  decisions  are  directed 
towards  the  single  goal  of  maximizing  the  value  of  the  criterion  function. 
In  this  chapter  we  shall  consider  a  class  of  multi-stage  decision  processes 
where  this  unanimity  of  purpose  no  longer  holds  true.  Some  decisions 
will  be  made  to  maximize  and  some  to  minimize. 

Perhaps  the  most  interesting  fashion  in  which  we  encounter  those  cross¬ 
purpose  processes  is  in  the  study  of  activities  in  which  two  animate  adver¬ 
saries  counter  optimal  moves  at  each  stage  of  the  process. 

A  number  of  situations  in  the  economic  world  may  be  profitably  con¬ 
sidered  in  these  terms,  and  the  theory  of  games  of  chance  and  skill 
affords  a  number  of  fascinating  applications  of  the  general  techniques. 

Furthermore,  in  the  physical  world,  in  connection  with  testing  and 
experimentation,  it  is  often  useful  to  conceive  of  nature,  in  some  vague 
anthropomorphic  fashion,  as  an  opponent  attempting  to  conceal  the 
truth  from  us.  The  design  of  experiments  may  be  conceived  of  as  a  game 
in  which  we  attempt  to  extract  information  from  a  stubborn,  but  fair, 
opponent. 

The  mathematical  theory  developed  in  recent  years  to  treat  problems 
characterized  by  this  interplay  between  divergent  aims  is  the  theory  of 
games.  Although  a  good  deal  of  effort  had  been  directed  in  this  direction 
by  E.  Borel,  the  theory  rests  upon  a  fundamental  result  of  von  Neumann, 
the  celebrated  min-max  theorem.  We  shall  very  briefly  discuss  the  foun¬ 
dations  of  the  theory  prior  to  a  discussion  of  multi-stage  games. 

These  multi-stage  games  may  be  considered  not  only  to  constitute  an 
extension  of  the  single-stage  theory,  but  in  many  ways  they  may  be  con¬ 
sidered  to  be  more  fundamental.  The  single-stage  game  may  be  conceived 
of  as  a  steady-state  version  of  an  original  dynamic  process,  namely  the 
multi-stage  process. 

After  these  preliminaries,  we  shall  discuss  some  particular  multi-stage 
games  arising  from  multi-stage  allocation  processes,  and  then  consider 
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"games  of  survival”  and  pursuit  games.  Following  these  examples,  we 
shall  present  a  general  formulation,  along  the  lines  of  Chapter  3,  and  then, 
as  in  Chapter  4,  prove  a  number  of  existence  and  uniqueness  theorems  for 
certain  important  special  classes  of  equations. 

In  the  main,  the  techniques  used  correspond  to  those  employed  in  the 
treatment  of  the  one-person  processes.  Games  of  survival,  however,  pre¬ 
sent  special  difficulties,  requiring  more  advanced  tools  for  a  general 
treatment.  The  method  wc  employ  is  applicable  only  to  a  restricted  class 
of  equations. 

One  of  the  interesting  aspects  of  games  of  survival  is  the  application 
of  this  concept  to  the  study  of  non-zero-sum  games,  where  the  players  are 
no  longer  in  direct  opposition.  A  formulation  of  these  games  in  terms  of 
survival  enables  us  to  remetrize  these  games  so  as  to  make  them  zero- 
sum.  Furthermore,  as  we  shall  show  below,  a  quite  reasonable  approxima¬ 
tion  enables  us  to  derive  a  new  metric  for  non-zero-sum  games,  one  with 
an  associated  min-max  theorem. 

§  ‘2.  A  Single-stage  discrete  game 

We  shall  now  consider  a  class  of  decision  processes  involving  two 
persons  which  we  shall  call  ga*nes.  The  two  protagonists,  whom  we  shall 
call  players,  will  be  named  rather  prosaically  A  and  B.1  Let  us  consider  a 
particular  game. 

The  rules  of  the  game  are  as  follows.  The  first  player,  A,  has  a  choice  of 
M  different  plays,  which  wc  shall  designate  by  the  numbers  1,  2,  . . ., 
M,  and  the  second  player,  B  has  a  choice  of  N  different  plays,  denoted  by 
1,2,  . . .,  Ar.  If  A  chooses  the  i — ,h  of  his  alternatives  and  B  the  / — ,h  of 
his  alternatives  A  receives  a  quantity  an  and  B  a  quantity  bij.  If  these 
quantities  are  positive,  we  may  think  of  them  as  gains,  and  if  negative  as 
losses. 

A  convenient  way  to  indicate  these  returns  or  payoffs,  is  by  means  of 
the  two  payoff  or  game  matrices 

(1)  Ma  =  (a„),  Mh  =  (M.  1  *  <  M  ,  1  ^  N  . 

Let  us  now  consider  the  single-stage  process  where  each  player  makes 
precisely  one  play.  The  determination  of  optimal  play,  defined  as  that 
which  maximizes  return,  is  straightforward  if  A  is  required  to  move 
before  B  and  if  B  can  use  this  information.  If  A  takes  the  i — ,h  alternative, 
B  chooses  /  =  ;(t)  so  as  to  maximize  bij.  Consequently  A  chooses,  i  so  as  to 
maximize  at,  m).  A  similar  rule  determines  the  choice  of  /  if  B  is  required 
to  move  first. 

1  The  successors  of  the  algebraic  A,  H,  and  C  discussed  by  S.  I.eacock 
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The  only  interesting  case  is  that  where  both  players  are  required  to 
move  simultaneously,  without  knowing  of  the  other’s  choice. 

In  these  circumstances,  they  can  protect  themselves  by  mixing  their 
choices,  which  is  to  say  they  will  randomize  their  choices  in  a  certain 
fashion.  Let  us  assume  then  that  A  makes  the  * — lh  choice  with  prob¬ 
ability  pt  and  B  the  j — ,h  choice  with  probability  q).  The  vector  p  = 
(Pi.  Pi,  •  •  • .  Pm)  specifies  A ’s  probability  distribution  and  the  vector  q 
—  (qt,  qt,  ....  qN)  specifies  B' s  probability  distribution. 

As  in  our  discussion  of  the  stochastic  processes  of  the  previous  chapters, 
we  can  no  longer  speak  of  the  return,  but  must  agree  to  consider  some 
average  return.  The  simplest  such,  as  usual,  is  the  expected  return.  The 
expected  return  to  A  will  be 


(2) 

Ea  (p,  q) 

M 

=  E 

A’ 

-  «i;  pt<li, 

i  -  1 

i  -  i 

while  that  for  B  is 

M 

s 

(3) 

Eb(P.  q) 

=  27 

£  b(j  pi  qi 

t  -  i 

i  -  i 

The  first  player  would  like  to  choose  p  so  as  to  maximize  E  A,  while  the 
second  player  would  like  to  choose  q  so  as  to  maximize  EB- 

§  3.  The  min-max  theorem 

In  order  to  obtain  definitive  results,  we  must  assume  that  the  players 
are  in  direct  opposition,  expressed  by  the  relation 

(1)  h,  =  —  at, . 

In  this  case,  the  game  is  called  zero-sum,  and  only  in  this  case  does  a 
satisfactory  general  theory  exist.  We  then  have 

(2)  EB(p,q)=—EA(p,q), 

from  which  it  is  clear  that  any  choice  of  p  and  q  which  increases  EA  (p,  q) 
decreases  EB(p,  q),  and  vice  versa. 

It  is  sufficient  then  to  consider  EA  (p.  q)  in  our  further  discussion.  We 
can,  using  this  expression,  define  two  values  of  the  game, 

(3)  V  A  =  Min  Max  EA  (p,  q) 

?  v 

F b  —  Max  Min  EA  (p,  q)  . 
v  ? 

The  first  is  the  expected  return  to  A  if  B  is  required  to  choose  q  before  A 
chooses  p,  while  the  second  is  the  value  to  Aii  the  situation  is  reversed. 
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It  is  a  remarkable  fact  (the  min-maxrt'heorcm  of  von  Neumann),  the 
basic  result  in  the  theory  of  games,  that 

(4)  VA=VB. 

This  common  value  is  called  the  value  of  the  game.  We  shall  assume  this 
result  without  proof  here. 

The  interpretation  of  this  result  is  that  A  can  announce  his  probability 
distribution  p  in  advance,  and  likewise  B  can  announce  q,  without  either 
gaining  from  this  advance  knowledge.  This  is  neither  an  intuitive,  nor  a 
trivial,  result,  but  it  is  true. 

§  4.  Continuous  games 

Let  us  now  suppose  that  in  place  of  choosing  one  of  a  discrete  set  of 
moves,  A  must  choose  from  a  continuum  and  similarly  B.  As  a  simple 
example,  suppose  that  A  must  choose  a  real  number  x  in  the  interval 
[0,  1],  and  B  similarly  a  real  number  y  in  [0,  1].  Considering  the  zero-sum 
case  only,  there  is  now  a  payoff  function  K  ( x ,  y)  which  measures  the 
value  of  this  set  of  moves  to  A ,  with  —  K  (x,  y)  the  value  to  B. 

If  A  chooses  a  distribution  F  (x)  to  govern  the  frequency  with  which 
he  chooses  x,  and  B  the  distribution  function  G  (y),  the  expected  gain  to 
A  will  be 

(1)  V  a  =  f  *  f  *  K  (x,  y)  dF  (x)  dG  \y) . 

Jo  Jo 

The  continuous  analogue  of  the  min-max  theorem  is  the  result: 

(2)  Max  Min  V a  —  Min  Max  VA  , 

F  <1  a  F 

where  the  variation  is  over  the  space  of  func  tions  defined  by 


(3) 

(a)  dF  ^  0, 

f*  *F{x)  =  1 

Jo 

f1 

(b)  dG  ^  0, 

Jo  dG  (y)  =  1 

provided  that  K  (x,  y)  is  jointly  continuous  in  x,  y  over  the  unit  square.1 
If  K  (x,  y)  is  not  continuous,  (2)  need  not  hold,  and  V A  (F,  G)  need  not 
even  exist  for  all  F  and  G. 

1  This  theorem  is  a  very  fine  illustration  of  the  utility  of  the  Sticltjcs  integral, 
since  the  result  is  not  valid  if  we  consider  only  functions  /•'  ( x )  and  G  (y)  which  are 
integrals,  i.e.,  ill-'  (x)  =  <{  (x)  Jx,  dG  (y)  =  ip  (y)  dy. 
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§  5.  Finite  resources 

In  many  situations  involving  multi-stage  play,  the  above  model  is  not 
satisfactory.  This  is  particularly  so  in  multi-stage  processes  where  each 
player  possesses  finite  resources.  Here  the  choice  of  plays  depends  upon 
the  quantity  of  resources  available,  and  the  game  terminates  when  either 
player  has  no  resources.  Consequently  we  cannot  consider  the  set  of  N 
games  as  consisting  of  N  disjoint  plays. 

Let  us  consider  a  simple  example.  Suppose  that  A  has  a  quantity  x  and 
B  a  quantity  y.  At  each  stage  each  player  may  allocate  1  or  2  units  of  his 
resources  with  the  return  a n  to  A  if  A  makes  an  allocation  of  i  and  B  an 
allocation  of  /,  and  a  return  of  —  an  to  B,  where  i,  /  =  1,2. 

Here,  for  the  sake  of  initial  simplicity,  the  return  an  is  in  units  different 
from  those  of  x  and  y,  and  so  cannot  be  reconverted  into  resources. 

Let  us  take  the  process  to  terminate  as  soon  as  either  side  has  no  re¬ 
sources  and  suppose  that  each  plays  to  maximize  his  total  return.  As¬ 
sume  that  we  may  define  the  function 

(1)  /  (x,  y)  —  expected  return  from  the  process  to  A  when  A  has  x  and 

B  has  y  initially,  and  each  employs  an  optimal  policy. 

On  the  first  move,  A  mixes  his  choices  according  to  the  probability 
distribution  p  =  (/>,,  pt)  and  B  according  to  the  distribution  q  —  (qlt  qt), 
where  p  and  q  will,  in  general,  be  functions  of  x  and  y. 

An  enumeration  of  possibilities  yields  the  relation 

(2)  f(x,  y)  —  £  £  pi  q>  [an  +  /(*  —  t,  y  — ;)]  , 

.  -  I  /  -  1 

for  an  optimal  policy,  assuming  for  the  moment  that  the  principle  of 
optimality  is  equally  valid  for  multi-stage  games.  A  proof  of  this  will  be 
given  in  §  9.  Thus  the  functional  equation  for /  ( x ,  y)  is 

(3)  f[x,  y)  =  Max  Min  (  £  £  pi  q,  [an  +  f(x  —  i,y  —  /)]) 

p  q  li  I  j*l  I 

=  Min  Max  (  £  £  pi  q}  \a(]  +  / [x  —  i,  y  —  ;)]) 

for  x,  y  >  0,  with  the  boundary  conditions 

(4)  /  (*>  y)  =  0  if  x  <;  0  or  y  <  0 . 

§  0.  Games  of  survival 

Returning  to  the  game  described  in  §  2,  let  us  take  A  to  have  x,  B  to 
have  y  and  assume  that  the  returns  an  and  bn  are  in  the  same  units  as  x 
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and  y,  say  dollars,  and  that  bij  =  —  an,  the  zero-sum  case.  Let  us  now 
suppose  that  the  game  is  continued  until  one  player  is  ruined,  and  that 
each  player  attempts  to  ruin  the  other.  A  game  of  this  type  we  call  a  game 
of  survival.  It  is  a  generalized  “gambler’s  ruin”  problem. 

Assuming  the  existence  of  the  function 

(1)  f(x,  y)  =  probability  that  A  ruins  B  when  A  has  x,  B  has  y,  and 

each  player  employs  an  optimal  policy, 

and  proceeding  as  before,  we  obtain  the  functional  equation 

(2)  /(*.  y)  =  Max  Min  Z  f(x  +  atj,  y  —  an)  pi  q, 

V  1  i,i 

=  Min  Max  Z  f{x  -f-  a(f,  y  —  ai})  ptqi, 

«  p  *.i 

x,  y  >  0,  with  the  boundary  conditions 

(3)  f(x,  y)  =  1,  x  >  0,  y  <;  0, 

=  0,  x  ^  0,  y  >  0  . 

Since  the  game  is  zero-sum,  the  quantity  of  resources  in  the  game  re¬ 
mains  constant.  Thus  the  state  of  the  process  is  specified  by  x,  the  quan¬ 
tity  possessed  by  A.  Setting  x  +  y  =  c,  and  / ( x ,  y)  =  / (x),  we  obtain 
the  simpler  equation  “ 

(4)  /  (*)  =  Max  Min  Z  f  (x  - f  an)  pi  qi  =  Min  Max  Z  f(x- f  a{})  Pi  qt , 

p  t  i.f  ip  i.  i 

for  0  <  x  <  c,  with  f(x)  =  0  for  x  <;  0 ,f(x)  =  1 ,  x  c. 

§  7.  Pursuit  games 

Another  interesting  class  of  games  are  those  involving  the  pursuit  of 
one  player  by  another.  In  some  cases  there  is  a  question  as  to  whether 
one  player  can  catch  the  other,  in  other  cases  where  capture  is  certain, 
the  problem  is  to  determine  the  choice  of  paths  for  one  player  which 
minimizes  the  time  of  capture  and  for  the  other  player  a  path  which 
maximizes  the  time  of  capture. 

The  continuous  versions  of  these  problems  are  quite  difficult  to  for¬ 
mulate  rigorously,  and  as  a  consequence  most  of  the  results  obtained  in 
this  connection  pertain  to  the  discrete  version. 

Consider  the  following  simple  problem.  The  two  players,  A  and  B  are 
situated  at  the  points  kA,  l A  respectively  on  the  line,  where  A  >  0  and 
k  a.nd  l  are  integers  or  zero.  At  each  move  of  the  game,  each  player  has 
the  choice  of  moving  one  unit  to  the  right  or  to  the  left.  Moves  are  made 
simultaneously  with  full  information  as  to  the  positions  of  each  player. 
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After  each  move  there  is  pay-off  from  B  to  A  of  an  amount  g  ( d )  where  d 
=  |  k  —  l\A,  the  distance  between  the  players.  Furthermore,  there  is  a 
probability  1  —  a  (d)  that  the  process  terminates  on  that  move. 

The  total  pay-off  of  the  multi-stage  game  is  taken  to  be  the  expected 
value  of  the  quantity  that  B  pays'A  before  the  process  terminates.  Once 
again  assume  that  the  function 

{1)  f(d)  =  expected  pay-off  if  A  and  B  are  initially  d  units  apart  and 
both  employ  optimal  strategies, 

exists.  Then  proceeding  as  before,  we  obtain  the  functional  equation 
(2)  f{d)  =  a  ( d )  Max  Min  [px  qj(d)  -f  p,qtf(d)  +  px  ?,/(<*  —  2) 

p  t 

+  pi  qj(d  -f  2)]  -f  g  (d) . 

=  a  (d)  Min  Max  [...]+£  (d) , 
ip 

where  />,,  pt  are  the  probabilities  of  A  going  to  the  left  or  the  right  re¬ 
spectively  on  any  move, and  qltqtzre  the  corresponding  probabilities  for 
B.  In  general,  optimal  plt  pt,  qx  and  qt  will  depend  upon  d. 

§  8.  General  formulation 

Let  us  now  describe,  in  some  generality,  a  class  of  multi-stage  games  we 
wish  to  analyze.  At  any  stage  of  the  game,  the  states  of  the  two  players, 
A  and  B,  are  represented  by  m-dimensional  vectors,  x  and  y,  which  we  can 
think  of  as  “resources.” 

In  order  to  avoid  for  the  moment  the  conceptual  difficulties  of  infinite 
processes,  we  shall  first  consider  a  finite  process.  At  the  beginning  of  each 
stage  of  an  A7-stage  process,  A  allocates  a  certain  quantity  of  his  resour¬ 
ces,  a  vector  u,  and  B  a  certain  quantity  of  his  resources,  a  vector  v,  this 
will  be  represented  symbolically  by  the  notation  0  u  <,  x,  0  <.v  <,y, 
where  the  inequalities  hold  component-wise. 

As  a  result  of  this  allocation,  there  are  two  consequences.  A  receives  a 
payoff  of  R  (u,  v ;  x,  y),  a  scalar  function,  and  B  a  payoff  of  —  R(u,v,  x,  y) 
—  a  zero-sum  process.  In  addition  to  these  payoffs,  there  is  an  alteration 
in  their  resources;  x  is  transformed  into  T  (x,  y;  u,  v ),  and  y  becomes 
T'  (x,  y;  u,  v).  The  process  now  continues  in  the  same  fashion  for  (N  —  1) 
additional  stages. 

The  total  return  to  A  of  the  AT-stagc  process  is 

(1)  Rff  =  R.\  («,  m, . mat  -  i;  v,  r„  . . .,  vN  -  i;  x,  y) 

=  R  (m,  r)  -f  R  (w„  r,)  +  . . .  +  R  {uN  -  i,  vN  -  l) . 

There  are  several  ways  we  can  treat  this  Ar-stage  process.  One  extreme 
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regards  the  iV-stage  game  as  a  single-stage  game  of  complicated  type,  re¬ 
quiring  a  choice  of  a  set  of  vectors  ( u ,  ....  us  -  1)  by  A  and  a  set 

(v,  vlt  . . .,  vtf  - 1)  by  B,  where  the  choice  of  uk  and  vk  is  dependent  upon 
the  choice  of  u,  uu  ...,uk-  i,  v,  vlt  . . .,  vk  - 1.  Alternatively,  we  can 
employ  the  functional  equation  approach.  For  the  case  of  unbounded 
processes,  or  processes  involving  stochastic  interaction,  the  recurrence 
technique  is,  in  general,  the  only  feasible  one.  For  the  case  of  finite  deter¬ 
ministic  processes,  this  technique  is  usually  simpler  analytically  and 
computationally. 

We  shall  assume  that  R  {u,  v;  x,  y)  is  a  continuous  function  of  u  and  v 
for  all  finite  values  of  u  and  v,  x  and  y,  and  that  similarly  T  (x,  y,  u,  v ), 
T'  (x,  y,  u,  v)  are  continuous  functions  of  x,  y,  u,  and  v  for  all  finite  values 
of  the  vector  variables. 

The  general  case  where  only  boundedness  and  measurability  of  the 
functions  are  assumed  may  be  handled  using  the  same  principles,  at  the 
expense  of  introducing  Sup-Inf  operators  in  place  of  Max-Min.  The  par¬ 
ticular  case  where  x,  y,  u,  v,  T,  T'  assume  only  finite  sets  of  values  is  also 
interesting  to  consider,  and  has  the  advantage  of  avoiding  continuity 
considerations. - 

.  One  advantage  to  considering  the  N- stage  process  as  a  single-stage 
process,  as  described  above,  is  that  it  permits  us  to  define  the  multi-stage 
game  precisely  on  the  basis  of  known  results  for  the  single-stage  game  and 
thus  the  value  of  the  multi-stage  game.  Once  having  defined  the  game, 
we  can  prove  that  recurrence  techniques  are  applicable. 

The  value  of  the  2V-stage  game  described  above  is  given  by  the  expres¬ 
sion 

(2)  vN  —  Max  Min  [//  RN  dG  ( u ,  «„  ut,  ....  uN  _  1)  dG'  {v,  v„  . vjv  -  1)] 

«  v 

=  Min  Max  [...], 
o'  o 

where  G  and  G'  are  distribution  functions  over  regions  of  quite  complicat¬ 
ed  form  defined  by  the  inequalities 

(3)  0  <.u<,x,  0  <,v<,y 

o  <,Ui<,T,  0  <;  <;  T' 

0  <;  m  ,v  -  i  SS  7  at  -  i,  0  <,  vs  -  i  T' n  -  l  • 

The  quantities  T  and  1'  depend  upon  x,  y,  u,  and  v;  Tlt  Tx'  depend  upon 
x,  y,  u,  v,  w„  t'j,  and  so  on. 
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$  9.  The  principle  of  optimality  and  functional  equations 

Let  us  now  change  our  notation,  replacing  x  by  P  and  y  by  P',  in  order 
to  consider  more  general  situations  where  x  and  y  are  not  necessarily 
vectors  whose  elements  are  quantities  of  resources. 

Since  vn  depends  only  upon  the  initial  states,  we  may  define  the  se¬ 
quence  of  functions 

(1)  fx(P.P')  =vn,N=  1,2,  ... 

Assuming  for  the  moment  that  the  principle  of  optimality  is  valid  for 
multi-stage  games,  we  obtain  the  following  recurrence  relations.* 

(2)  /,  (P,  P')  =  Max  Min  [  JJ  R  (u,  v)  dG  («)  dG'  (t>)] 

a  o’  o  <  u  <  r 
o  <  r  <  r 

=  Min  Max  [ _ ] , 

O'  o 

fN  + 1  (P,  P')  =  Max  Min  [  JJ  [R  (u/v)  +  fN  (T,  P')]  dG  (u)  dG'  (»)] 

a  o'  o  <  h  <  r 

,  u  <  r  <  /- 

=  Min  Max  [...]. 

O'  o 

That  this  principle  is  valid  for  one-person  processes  where  we  are  at¬ 
tempting  to  maximize  a  return,  or  minimize  a  “cost”  is  clear  by  contra¬ 
diction.  Since  its  validity  may  not  be  as  obvious  for  game  processes,  let 
us  present  a  brief  proof  for  the  sake  of  completeness. 

The  recurrence  relation  in  (2)  provides  a  sequence,  not  necessarily 
unique,  of  pairs  of  distribution  functions,  {GN(u,  P,  P'),G'y(v,  P,  P')} 
which  furnish  the  sequence  {/n  (P,  P')}.  In  order  to  show  that  the  func¬ 
tion  /.v  (P,  P')  is  actually  the  value  of  the  N-stage  game,  it  is  sufficient  to 
show  that  A  can  guarantee  an  expected  return  of  fN  (P,  P')  if  he  chooses 
u  at  the  first  stage  of  an  .V-stage  process  in  accordance  with  the  distribu¬ 
tion  function  Gn  (n,  P,  P'),  when  the  states  of  A  and  B  are  described  by 
P  and  P',  respectively,  and  similarly  that  B  can  guarantee  an  expected 
loss  of  not  more  than  —  fs  (P,  P'). 

To  demonstrate  this,  consider  the  one-person  2V-stage  process  in  which 
A  employs  the  fixed  strategy  represented  by  the  sequence  of  distribution 
functions,  {G*  ( u ,  P,  P')},  k  =  1,2,  . . .,  N,  and  B  attempts  to  minimize 
v4’s  expected  Ar-stage  return.  It  is  sufficient  to  consider  this  process, 
since  any  other  policy  employed  by  B  yields  a  larger  expected  return 
for  A.  Let 

*  To  simplify  the  notation,  we  shall  write  A  (n,  v)  for  Jf  ( u ,  v;  P,  P'). 
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(2)  wn  (P,  P')  =  iV-stage  expected  return  to  A  when  A  employs  the 

fixed  strategy  {G*  (u,  P,  P')},  B  employs  a  minimizing 
strategy,  and  A  and  B  are  in  the  states  P  and  P' 
initially. 

Then  we  have  the  recurrence  relations 

(3)  u-l  (P,  P')  =  Inf  J  [  /  R  («,  v)  dG  («,  P,  P')  |  dC  (v), 

O'  0  <  r  £  7*  0  <«5f 

WN  + 1  (P,  P')  =  Inf  J  [  /  [/?  (n,  «)  +  WN  (T,  T')} 

O'  0  <  r  <  r  0  <  u  <  P 

dGN+i(u,  P,  P')]dG'  (r). 

upon  employing  the  principle  of  optimality  for  the  one-person  process. 

Considering  the  origin  of  the  function  Gl(  we  see  that  the  minimum  in 
the  relation  for  w,  (P,  P')  in  (3)  is  attained  by  the  function  G'  —  G,',  not 
uniquely  in  general.  Hence, 

(4)  v>i  (• P ,  P')  =  Vt  (P,  P') . 

Since  wt  =  vt,  the  relation  for  wt  yields  in  the  same  way  the  fact  that 
wt  =  vt,  and  thus,  inductively,  we  see  that 

(5)  wN  (P,  P')  =  vN  (P,  P') . 

In  precisely  the  same  way  we  show  that  if  B  employs  the  strategy 
Gn'  (v,  P,  P'),  A  cannot  obtain  more  than  vN  (P,  P')-  Hence  vn  (P,  P')  is 
the  value  of  the  Ar-stage  game. 

§  10.  More  general  process 

Before  presenting  some  precise  statements  concerning  the  processes  we 
have  discussed  above,  let  us  consider  a  sequence  of  more  general  processes 
which  may  be  treated  by  means  of  the  same  techniques  we  shall  employ 
below. 

Consider,  to  begin  with,  an  infinite  process  of  the  type  described  in  §  & 
in  which  we  allow  the  transformations  T  and  T' ,  as  well  as  the  return  /?,, 
to  depend  upon  the  stage. 

We  then  consider  the  functions 

(1)  /(P,  P';  k)  =  the  value  to  A  of  the  infinite  process  beginning  at  the 
k — ,h  stage  when  A  and  B  possess  P  and  P'  at  this 
stage,  and  both  employ  optimal  strategies. 

This  sequence,  with  the  usual  proviso  relating  to  existence,  satisfies  the 
recurrence  relation 
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(2)  f(P,P'-k) 

=  Max  Min  [//  [/?  (w,  v,  k)  +J(Tt,  7Y;  k  +  1)]  dG  (u)  dG'  (»)] 

a  «'  o  <  h  <  /* 

o  ^  P  <  /- 

=  Min  Max  [  ]  . 

w  a 

Let  us  now  complicate  the  process  to  a  further  degree.  We  have  assum¬ 
ed,  in  the  above  formulation,  that  the  interaction  between  the  players 
was  perfectly  determined  once  u  and  v  were  chosen.  In  a  variety  of  pro¬ 
cesses,  a  choice  of  u  and  v  determines  a  distribution  of  outcomes,  which 
is  to  say  the  interaction  is  stochastic  rather  than  deterministic.  Let 
Kt  (z,  t,  t'  \  u,  v)  denote  the  distribution  function,  where  z  is  the  value  of 
R*  (u,  v),  t  the  value  of  Tt  and  t'  the  value  of  T 

The  functional  equation  of  (2)  is  replaced  by 

(3)  f(P,  P'\  k) 

=  Max  Min  [JJ  [/ [*  +  /(*,  i' \  k  +  1)]  dKk]  dG  (u)  dG'  (w)] 

a  w  o  £  u  <  /■ 

•*  <«•</*' 

—  Min  Max  [  ] . 

o-  a 

Finally,  let  us  consider  the  case  where  we  are  concerned  with  a  non¬ 
linear  function  of  the  total  return,  R,  rather  than  the  total  return  itself. 
A  particularly  important  situation  is  that  where  A  wishes  to  maximize 
the  probability  of  achieving  a  return  of  at  least  Ra,  a  specified  constant. 
Another  interesting  utility  function  is  e"". 

Let  us  assume  that  A  wishes  to  maximize  the  expected  value  of  q>  (R), 
where  <p  is  a  given  function  of  R.  To  describe  this  nonlinear  situation,  we 
must  introduce  an  additional  state  variable,  a,  the  total  return  obtained 
by  A  from  the  previous  stages  of  the  process.  Defining  the  function 
/ (P,  P' ,  a;  k)  essentially  as  in  (1),  we  obtain  the  associated  functional 
equation 

(4)  /  (P ,  P' ,  a  \  k) 

=  Max  Min  [JJ  [/{/,<',  a  +  z;  k  +  1)  dKk]  dG  («)  dG'  (v)l 

(/  O'  O  <  u  <  V 
0  <  r  <  /•' 

=  Min  Max  [  ] . 

O'  o 

None  of  these  functional  equations  will  be  discussed  here  in  connection 
with  the  existence  and  uniqueness  of  solutions  since  the  basic  approach 
is  the  same  for  all  cases. 


293 


MULTI-STAGE  GAMES 


$  11.  A  basic  lemma 

Let  us  present  a  simple  but  extremely  useful  inequality  which  exhibits 
the  quasi-linearity  of  the  transformation 

(1)  L(f)  =  Max  Min  T  (P,  P';f-,G,G')  =  Min  Max  T. 

a  «•  w  o 

It  will  play  the  same  role  in  the  existence  and  uniqueness  proofs  of  this 
chapter  that  Lemma  1  of  Chapter  4  played  in  that  chapter. 

Lemma  l.4  Let 

(2)  L  (/)  =  Max  Min  [  ff  [ R  (u,  v)  +  h  (P,  P' ;  u,  v)  f(T,  T')]  dG  ( u )  dG'  (*»)] 

O  0'  u  e  S 

r  t  S' 

-=  Min  Max  [  ]. 

«•  o 

Lj  (F)  =  Max  Min  [//[/?,  (u,  v)  +  h  {P,  P' ;  u,  v)  F  ( T ,  T’)]  dG  ( u )  dG'[v )] 

O  O'  II  I  s 

r  e  S' 

=  Min  Max  [  ]. 

O'  o 

Then 

(3)  |  L  (/)  —  L,  (F)  |  ^  Max  Max  [  |  R  (u,  v)  —  Rt  (u,  v)  | 

m  *  S  re  S' 

+  \h(P,  P‘-H,  V)  11/(7-,  T’)  —  F  (T,  T)  |], 

Proof:  Let  us  write 

(2)  L(f )  =  Max  Min  T  (P,  P';/;  G,  G’)  =  Min  Max  T  ( P ,  P':  /,  G,  G') 
o  o'  o'  o 

Lx  (F)  =  Max  Min  P,  (P,  P';P;G,G')  =  Min  Max  P,  (P,  P';F;G,G')  . 
o  o'  o'  o 

Let  (G,(  G,')  be  a  pair  of  functions  yielding  the.  value  L(/),  and  let 
(Gj.  G2')  be  a  pair  of  functions  yielding  the  value  L,  (P).  Then,  by  virtue 
of  the  saddle-point  property,  we  have  the  following  chain  of  equalities 
and  inequalities: 

(5)  L(f)  =  P  (P,  P';/;  G„  Gt')  ^  T  (P,  P';/:  G„  G,') 

<^P(P,  P';/;  Gj.G,') , 

Pt  (P)  =  P,  (P,  P' ;  P;  G2,  G2')  >  P.  (P,  P';  P;  G„  G2') 

<^Tl(P,  P’  ,  F ;  Gt,  Gt')  . 

«  It  is  assumed  that  max-min  =  min-max  for  each  transformation.  A  similar 
result  holds  for  the  one-sided  max-min  operator;  sec  §  IS. 
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Combining  these  inequalities  we  have 

(6)  L  (/)  —  £,(/)  2>  T  ( P .  P' ;  /;  G„  Gx)  -  7*,  (P.  P' ;  F ;  G„  G,') 

^  T  (P,  P';/;  G„  G,')  -  Tx  (P.  P';  P;G,.  G,') . 

The  inequalities  in  (6)  yield 

(7)  L  (/)  -  Lx  (F)  ^  //  [P  («.  V)  -  P,  <«,  r)  +  A  (P.  P' ;  «,  v)  [f  ( T ,  T') 

U  «  .V 

r  i  .V* 

—  F(T,  T')]]dGt  (u)  dGx  (v) 

^  JJ  [P  («,  v)  —  P,  («,  v)  +  A  (P,  P’;  u.  v)  f/(r,  T') 

u  «  .S’ 
r  «  .S" 

-P(T,  T')]  ]  dGx  (u)  dGt'  (v)  . 

Using  as  in  Chapter  4  the  fact  that  a<Z.c-<,b  implies  |  c  \  <,  Max 
(|  a  |,  |  b  |),  we  obtain  from  (7)  the  further  inequality 

(8)  |  L  (J)  -  Lx  ( F )  |  <;  Max  {  [  JJ[  |  P  [u,  v)  -  Rx  (u,  v)  \ 

u  e  S 

,  re  S' 

+  |  h(P,  P'\ u.  v)  |  |/(7',  T')  -  F  ( T .  T)  |  ]  dGt  («)  dGx'  («)], 
[J/[  |  P  (u,  v)  -  Rx  (u,  v)  |  +  |  h  (P,  P'- u,  v)  |  1/(7',  V) 

ueS 
v  e  S' 

—  F(T,  T')\]dGx  (u)  dGt'  (v)], 

from  which  (3)  follows  immediately. 

It  is  easy  to  make  the  modifications  required  to  obtain  the  analogous 
result  for  the  case  where  Max  Min  is  replaced  by  Sup  Inf. 

§  12.  Existence  and  uniqueness 

Before  stating  our  results,  let  us  introduce  some  notation.  Let  P  and  P' 
represent  n-  and  w'-dimensional  vectors  defined  over  regions  D  and  D' , 
respectively,  each  containing  the  origin  in  its  respective  space.  For  all 
values  of  u.v.P  and  P',  the  transformed  vectors  T  (P,  P';  u,  v), 
T'  (P,  P';  u,  v),  are  required  to  lie  within  these  same  domains,  where  u 
and  v  are  k-  and  ^'-dimensional  choice  vectors,  respectively,  constrained 
to  domains  S  and  S',  which  may  or  may  not  depend  upon  P  and  P'. 
Since  we  shall  be  dealing  with  shrinking  transformations  in  the  theorem 
below,  there  is  no  loss  of  generality  in  taking  D  and  D'  to  be  finite. 

In  each  space,  let  us  introduce  the  norm,  |  j  P  |  |,  equal  to  the  sum  of 
the  absolute  values  of  the  components  of  P, 
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0)  l|P|l  =  Z \Pi\. 

i  -  1 

\\P'\\=  £ \Pi'\. 

i  -  I 

Actually,  these  norms  need  not  be  identical,  and,  in  some  situations,  it 
might  be  useful  to  consider  norms  molded  to  the  structure  of  the  func¬ 
tional  equation,  rather  than  standard  norms  of  the  above  type. 

The  functional  equation  we  shall  consider  in  some  detail  is 

(2)  /(P,  P')  =  Max  Min  [/  /  [P  (u,  v) 

a  o'  u» s  v  §  s' 

+  h  (P,  P';  u,  v)f  (T,  T’)]  dG  («)  dG'  (»)] 

=  Min  Max  [  ] , 

O'  o 

where T  =  T(P,P u,  v),  T'  =  T  (P,  P'\ u,  v) . 

To-  simplify  our  notation,  let  us  represent  the  operator  appearing 
within  the  outer  brackets  in  equation  (2)  by  T  (P,  P' ;  /;  G,  G').  The 
equation  in  (2)  then  assumes  the  form 

(3)  /  (P,  P')  =  Max  Min  T  (P,  P' ;  /;  G,  G') 

a  o' 

=  Min  Max  T  (P,  P';/;  G,  G') . 
a ■  a 

There  is  a  question  as  to  whether  this  should  be  referred  to  as  one  equa¬ 
tion  or  as  a  pair  of  equations.  We  shall  refer  to  (3)  as  "an  equation." 

The  result  we  shall  demonstrate  is 

Theorem  1.  Consider  the  equation  in  (2)  under  the  following  assumptions: 

(4)  (a)  The  functions  R  (u,  v),  h  (P,  P';  u,  v),  T  (P,  P';  u,  v)  and  T'  (P, 

P';  u,  v)  are  continuous  functions  of  P  and  P' ,  u  and  v,  in  any 
bounded  domain  of  the  variables. 

(b)  The  choice  domains,  S  (P,  P’),  S'  (P,  P'),  vary  continuously  with 
P  and  P\ 

(c)  T  and  T'  are  shrinking  transformations,  i.e., 

Max  (  |  |  T  |  j  +  |  |  T'  |  |)  ^  *  (  |  |  P  |  |  +  |  |  P'  |  |) , 

umS 
v  «  A” 

where  k  is  a  fixed  constant  less  than  1. 
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(d)  Let  E  w  (kn  c)  <  oo  for  all  c  >  0,  where 

n  —  1 

w  (c)  =  Max  (Max  |  R  (u,  v)  | ) . 

Iim +IIJ>'II  <« 

9  $  S 

(e)  Max  |  |  A  («.  v.  P,  P')  \  \  <£  1 . 

u,  t,  /*,  P” 

//  the  above  conditions  are  satisfied,  we  can  assert  that  there  is  a  unique 
solution  of  the  equation  in  (2)  within  the  class  of  functions  f  (P,  P')  which 
are  continuous  for  all  finite  P  and  P'  and  vanish  when  P  and  P'  are  both 
null  vectors. 

This  solution  may  be  found  by  the  method  of  successive  approximations, 
(5)  f0  (P,  P')  =  Max  Min  [  f  J  R  («,  v)  dG  (u)  dG'  (i>)] 

O  O’  utS 
V  €  S' 

—  Min  Max[  ] , 

O'  o 

/«  + 1  (P,  P')  =  Max  Min  T  (P,  P' ;  /„ ;  G,  G') , 
a  o’ 

=  Min  Max  T  (P,  P';/»;G,  G'),n^  0. 
o-  o 

The  solution  is  obtained  as  the  limit  f(P,P')  —  lim  /„  (P,  P'),  in  any 

fl  ■ — ►  OO 

bounded  region  of  (P,  P')  space. 

We  shall  further  demonstrate 

Theorem  2.  Under  the  hypothesis  of  Theorem  1,  a  set  of  functions  {G  ( u ), 
G'  (v))  furnished  by  the  functional  equation  constitute  a  set  of  optimal 
strategies  for  A  and  B,  respectively,  in  the  multi-stage  game  described  above. 


§  13.  Proof  of  results 

Let  us  now  proceed  to  the  proofs  of  these  results.  Let 


(1) 

and 


f0  ( P ,  P’)  =  Max  Min  [J/P  (u,  v)  dG  («)  dG'  (i>)]  . 

()  G'  u  e  S 
r  «  IS’ 

—  Min  Max  [  ], 

o'  o 


(2)  fn  + 1  (P,  P')  =--  Max  Min  T  (P,  P' ;  /„ ;  G,  G')  =  Min  Max  T , 
O  O'  O'  o 


where  T  is  defined  as  in  (4.2)  and  (4.4). 
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By  virtue  of  our  assumptions  concerning  the  coefficient  functions  and 
the  domains  S  and  S’,  we  can  assert  the  existence  of  the  saddlepoint  in 
(1),  and  the  continuity  of  f0  ( P ,  P').  Inductively,  then,  all  the  /»  ( P ,  P') 
exist  and  are  continuous  for  all  finite  P  and  P' . 

Let  us  now  show  that  the  sequence  {/n}  converges  uniformly  in  any 
finite  portion  of  the  (P,  P')-regions.  Using  Lemma  1  we  obtain  the  in¬ 
equality 

(3)  |/, +i(P,  P')  — /„  (P,  P'j  | ' 

^  Max  Max  [JJ  |/„  ( T ,  T)  —fn-l(T.T')\  dG  («)  dG'  (»)] . 

«  O' 

n  =  2,  3,  .... 

Define  the  new  sequence 

(4)  m„  +i(c)  =  Max  |  /.  + 1  (P,  P')  — /„  (P,  P')  | . 

I  I  /*  I  I  +l|/>'||<r 

Then  (3)  yields,  using  the  assumption  of  (4a)  of  §  3, 

(5)  un  + 1  (e)  <;  m»  (kc),  n  =  2,  3, 

Also,  we  have 

(6)  |/,  (P,  P')  -/,  (P,  P')  |,  <1  Max  Max  JJ  |  R  («,*)  |  rfG  («)  dG'  ( v ) , 

«  O' 

whence 

(7)  (c)  ^  w  (c) . 

Using  our  assumption  that  w  ( kn  c)  <  oo,  we  see  that  the  series 
£[f„+\  (P,  P')  —  fn  {P,  P')]  converges  uniformly  in  any  finite  region. 

n 

Hence /„  (P,  P')  converges  unn’ormly  to  a  function  / (P,  P')  which  satis¬ 
fies  the  original  functional  equation. 

This  completes  the  proof  of  existence.  Let  us  now  turn  to  a  proof  of 
uniqueness.  Let  F  (P,  P')  be  another  solution  which  is  continuous  at 
P  =  0,  P'  =  0,  and  bounded  in  any  finite  region.  We  see  that  F  (P,  P') 
is  then  actually  continuous  for  all  finite  P  and  P',  although  this  fact  is 
not  necessary  for  our  proof.  It  does  simplify  it  a  hit  since  we  can  replace 
Sup-Inf  by  Max-Min. 

We  then  have  the  two  equations 

(8)  F  (P,  P')  =  Max  Min  T  (P,  P' ;  P;  G,  G')  =  Min  Max  1 

u  < i ■  «•  u 

/(P,  P')  =  Max  Min  T  (P,  P';/;  G,  G')  —Min  Max  T 
(1  «'  O'  o 
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Applying  Lemma  1,  we  see  that 

(9)  |  F  (P,  P')  —f{P,  P')  |  ^  Max  Max  [  ff  |  F  (T,  T')  — f(T,  T')  \  dGdG ' 

a  o'  u  t  s 

V  9  S' 

Let 

(10)  A  (c)  =  Max  |  F  (P,  P')  —f(P,  P')  | . 

I|J*I1  +  11^11  <* 

The  (9)  yields  the  relation 

(11)  A(c)^A(kc), 

which,  upon  iteration,  yields  A  (c)  <,  A  ( kn  c),  n  =  1,  2 . Since  F  and 

/  are  both  continuous  at  P  =  0,  P'  =  0,  and  have  the  common  value  0 
there,  we  see  that  A  (kn  c)  -*■  0  as  u  ->•  oo.  Hence  A  (c)  =  0  and  F  =  /. 
This  completes  the  proof  of  Theorem  1, 

§  14.  Alternate  proof  of  existence  ... 

In  the  study  of  functional  equations  of  this  class,  the  proof  of  the 
existence  is  “cheap,”  while  the  proof  of  the  uniqueness  requires  varying 
degrees  of  effort.  As  far  as  the  functional  equations  arising  from  the  cal¬ 
culus  of  variations  are  concerned,  the  opposite  is  true;  there,  existence  is 
difficult  and  uniqueness  is  simple. 

Let  us  indicate  how  we  may  establish  the  existence  of  a  solution  of  the 
Sup-Inf  equation  in  the  case  where  we  assume  that  R  (u,  v)  ;>  0  and 
h  ( P ,  P';  m,  v)  2>  0.  It  follows  inductively  that  the  sequence  {/n  (P,  P')} 
is  monotone  increasing  and  bounded.  Hence  the  sequence  converges  to  a 
function  / (P,  P'). 

To  show  that  this  function  satisfies  the  functional  equation 

(1 )  /  (P,  P')  =  Sup  Inf  T  (P,  P' ;  /;  G,  G') 

u  w 

=  Inf  Sup  T, 

O'  o 

we  proceed  as  follows.  We  have 

(2)  /  (P,  P')  2>  /„  + 1  (P,  P')  =  Sup  Inf  T  (P,  P' ;  ;  G,  G') , 

a  w 

and  thus 

(3)  Z(P.P')  ^  Sup  Inf  P(P,P';/;G,G'). 

t!  O' 

Conversely,  utilizing  the  positivity  of  the  operator,  we  have 


(4) 


fn  +  i{P,  P')  ^  Sup  Inf  T  (P,  P';/;  G,  G')  , 
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for  all  n,  and,  in  consequence, 

(5)  /  ( P .  P')  <.  Sup  Inf  T  (P,  P' ;  /;  G,  G') . 

Comparing  (3)  and  (5)  we  see  that  we  have  equality. 

§  15.  Successive  approximations  in  general 

The  sequence  of  approximations,  {/» (P,  P')},  used  to  construct  the 
function  f(P,  P')  was  precisely  that  obtained  from  the  finite  n-stage  pro¬ 
cesses.  This  is  actually  not  the  best  sequence  to  use  if  we  are  interested 
only  in  the  infinite  stage  process.  As  we  have  pointed  out  in  previous 
pages,  approximation  in  “policy  space”,  here  "strategy  space”,  is  in 
many  ways  a  more  natural  and  more  important  type  of  approximation. 
To  justify  this  and  other  types  of  approximations  we  require 

Theorem  3.  Under  the  assumptions  of  Theorem  1,  the  sequence  defined  by 

(1)  /„  + 1  (P,  P')  =  Max  Min  T  (P,  P' ;  /„ ;  G,  G'),  n  =  0,  1,  . . . 

a  .oi 

=  Min  Max  T  (P,  P' ;  ;  G,  G') 

o'  a 

converges  to  the  solution  of  (5.3)  for  any  initial  function  f0  (P,  P')  which  is 
continuous  in  any  finite  part  of  the  (P,  P')-domain,  and  equal  to  zero  at 
P  =  0,  P'  =  0. 

The  proof  is  precisely  the  same  as  that  given  above. 

§  16.  Effectiveness  of  solution 

We  have  established  existence  and  uniqueness  of  the  functional  equa¬ 
tion  derived  above  under  the  assumption  that  the  infinite  process  posses¬ 
sed  a  value  for  each  player.  The  question  now  arises  as  to  whether  the 
functional  equation  actually  yields  sufficient  information  to  allow  each 
player  to  obtain  this  value.  If  so,  we  say  that  the  solution  is  effective,  and 
theoretically,  the  functional  equation  is  equivalent  to  the  game.6 

The  solution  will  be  effective  under  the  hypotheses  of  Theorem  1, 
which  is  to  say,  continuity. 

To  show  effectiveness,  under  the  hypotheses  of  Theorem  2,  we  must 
show  that  if  A  uses  a  distribution  function  G  (u)  =  G  (u;  P,  P')  obtained 
from  a  pair  ( G ,  G')  which  yield  the  min  max,  then,  regardless  of  what  B 
may  do,  A  can  guarantee  himself  a  return  of  at  least  /(P,  P'). 

1  In  many  ways,  however,  this  is  not  true.  Once  the  functional  equation  has 
been  formulated,  and  the  process  discarded,  we  have  restricted  ourselves  to  a 
certain  direction  of  approach  which  may  not  be  optimal  for  the  derivation  of  all 
properties  of  the  process.  It  is  well  then  to  keep  in  mind  that  the  above  functional 
equation  is  only  one  of  many  possible  mathematical  descriptions  of  the  process. 


300 


MULTI-STAGE  GAMES 


Employing  this  fixed  strategy,  A ’s  return  will  be,  at  worst,  determined 
by  the  solution  of  the  functional  equation 

(1)  F  (P,  P')  =  Min  [//[/?  («,  v) 

O'  ut  .s' 
r  *  .N" 

+  h  ( P ,  P'\  T,  T')  F  ( T .  T')]  dG  (u)  dG'  (v)] . 

It  is  easy  to  show,  using  the  techniques  of  the  preceding  chapters, 
together  with  the  assumptions  we  have  made,  that  this  equation  has  a 
unique  continuous  solution  which  is  zero  at  P  =  0,  P'  —  0.  Furthermore, 
the  solution  of  this  equation  may  be  obtained  as  the  limit  of  the  sequence 
defined  by 

(2)  F„  (P,  P’)  =  Min  [  J  J  R  («,  v)  dG  (»)  dG'  (»)]  , 

G'  ut  S 
r  t  S' 

Fn+i(P>  n  =  Min  J/  [R  (u,  v) 

G*  ut  S 
v  t  S' 

+  h  (P,  P' ;  u,  v)  Fn  (T,  T')]  dG  («)  dG'  ( v ) . 

It  is  clear,  from  the  derivation  of  G  (m),  that  F  =/,.  Hence,  inductively, 
Fn  +i  =fn  +i,  as  defined  by  (14.1).  Thus 

(3)  F  (P,  P')  =  !im  Fn  =  lim  /»  *=  /  (P,  P') . 

t  oo  n  •-»  oo 

This  demonstrates  the  effectiveness  of  the  solutions. 

With  reference  to  the  remarks  made  in  §  6  of  Chapter  4,  let  us  now 
establish 

Theorem  4.  Let 

(1)  A  (c)  =  Max  Max  |  R  ( u ,  v)  —  R'  (u,  v)  \ . 

I  I  /’I  I  +  I  I  /"  I  I  <  r  u  .  S 
v  e  S 

Then ,  under  the  hypotheses  of  Theorem  1,  the  solutions  of 

(2)  /  (P,  P')  =  Max  Min  J  J  [R  («,  v)  +  h  (P,  P9 ;  m,  v)  f(T,  T9)]  dGdC/ 

G  G'  ut  S 
v  t  S' 

=  Min  Max  [...], 

G  Q ' 

F  (P,  P‘)  =  Max  Min  J  J  [Rf  (u,  v)  +  h  (P,  P9  \u.v)F  (T,  T)  ]  dGdG' 

G  O'  ut  S 
v  e  S' 

=  Min  Max  [. . .] 
o'  a 
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satisfy  the  inequality 

(3)  i  f(p.  n — f  (p,  n  i  ^  r  a  (k*  C) . 

n  —  0 

Proof.  Applying  the  Lemma  of  §  3,  we  see  that 

(4)  I  f(P,  P')  —  F  ( P ,  P')  I  <£  Max  Max  //  [  |  R  —  R'  \ 

1 1  0‘  ut  s 

v  «  .S' 

+  \f(T,  T')  —  F  (T,  T')  |  ]  dGdG' . 

Iteration  of  this  inequality  yields  the  desired  result. 

§  17.  Further  results 

The  results  obtained  in  the  previous  sections  depended  upon  the  fact 
that  the  total  resources  of  the  system  were  diminished  as  a  consequence 
of  the  play  at  any  particular  stage  of  the  game.  Analytically,  we  may 
express  this  by  the  statement  that  the  transformation  (P,  P')  ->  (T,  T') 
is  a  shrinking  transformation. 

Let  us  now  introduce  a  shrinking  transformation  in  another  way  by 
assuming  that 

(1)  |  h  (P,  P’,  u,  v)  |  <£  k  <  1 , 

for  all  admissible  P,  P',  u,  and  v.  Provided  that  we  assume  that  P  and 
P'  now  be  within  bounded  domains,  with  T  and  T'  transformations  of 
these  domains  into  themselves  for  all  u  and  v,  we  obtain  ready  analogues 
of  the  preceding  theorems  under  the  assumption  of  (1).  We  shall  leave 
the  formulation  ar.d  proofs  of  the  results  as  exercises  for  the  reader. 

§  18.  One-sided  min-max 

Let  us  now  consider  the  equation 

(1)  f(P,  P')  =  Min  Max  [R  ( u ,  v)  +  h  (P,  P' ,  u,  v )f(T,  T')] , 

r  e  S'  u  e  S 

which  arises  from  the  allocation  process  described  above  if  the  second 
player  is  required  to  announce  his  choice  of  v  to  the  first  player  before 
each  play. 

We  can  obtain  an  analogue  of  the  basic  lemma  of  §  10  in  the  following 
way.  For  any  function  R  ( u,-.v )  permitting  the  operations  we  have 

(2)  Min  Max  R  (u,  v)  =  Min  Max  R  ( u ,  v)  , 

r  r  S'  u  e  S  r  e  .S'  u  (v)  e  S 

where  u  ( v )  is  now  a  function  which  maximizes  R  (u,  v)  for  fixed  v.  Let 
U  (v)  be  this  function. 
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Let  V  be  a  value  of  v  which  minimizes  R  ( U  ( v ),  v).  Then  we  have  the 
inequalities 

(3)  R  ( U  ( V ),  V)  <;  R  (U  (v),  v) , 

R(U(V),  V)  R  (u  (V),  V). 

for  any  other  admissible  values  of  u  and  v.  This  saddlepoint  property 
yields  the  analogue  of  Lemma  1.  Having  obtained  this  lemma,  the 
proofs  of  existence  and  uniqueness  proceed  in  a  straightforward  fashion. 


§  19.  Existence  and  uniqueness  for  games  of  survival 

We  shall  prove  the  following  result : 


Theorem  5.  Consider  the  equation 

(1)  f(x)  =  Min  Max  [/>,  qj{x  —  1)  *+  px  qxf{x  +  a)  + 

P 


Pi  +  pt  qtf(x  —  6)]  , 


=  Max  Min  [/>,  qxf(x  —  1)  +  px  qxf(x  +  «)  + 

P  9 

Pt<hf{*  +  c)  +  ptqtf(x—  *.)]  , 
for  x  =  1,2,3,  ...  d  —  1 ,  associated  with  the  game  matrix 

<2>  -$■ 

where  a,  b,  and  c  are  positive  integers,  a  >  1,  and  f  (x)  satisfies  the  boundary 
conditions: 


(3)  f(x)  =0,x^0,f(x)  =  l,x>d. 


There  is  a  unique  function  f  (x)  satisfying  the  inequalities  0  <,f(x)  <L  1, 
which  satisfies  (1)  and  (3). 


Proof.  To  simplify  the  notation,  let  us  set  V  (f  (x))  as  the  value  of  the 
game  whose  matrix  is 


(4) 


(f(x-\)  f(x  +  a)\ 
[fix  +  c)  f(x-b)) 


The  functional  equation  in  (1)  has  the..form 


(5) 


f(x)  =  F  (/(*)),  x  =  1,2, 
f(x)  —  0,  .r^O 

f{x)  =  1  ,  x  >d. 


d—  1 
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Let  us  define  the  sequence  {/»  (at)}  as  follows. 

(6)  /.(*)  =  1.x  2nd, 

=  0,x<.d  —  1  , 

fn  +  I  (x)  —  V  (/„  (*)).  «  =  0,  1,  2 . X  =  1,  2,  . 

fn  +  I  (*)  =  L*  ^ 

=  0,X<£  0. 


.  1, 


It  is  clear  that  /,  (x)  ^  /<,  (x)  for  all  x,  and  hence  inductively  that 
fn  +i  ( x )  ;>/n-(x).  It  follows  from  the  fact  that  0  <;/„  (x)  <;  1  for  all  x 
and  n,  that  /„  (x)  converges  as  n  ->  oo  for  all  x  to  a  function  /(x).  That 
/ (x)  satisfies  (5)  is  easily  seen.  This  completes  the  proof  of  existence. 

Since  f0  (x)  is  a  monotone  increasing  function  of  x.each  function  /„  (x) 
is  monotone  increasing,  and  hence  /  (x)  is  monotone  increasing.  Let  us 
now  demonstrate  the  important  result  that  it  is  actually  strictly  monotone. 
Upon  this  fact  our  proof  of  uniqueness  depends. 

We  have 


/( 1)  =  V  [  0 

/u  1/(0  o) 


If /  (a)  and /  (c)  are  positive,  we  have/(l)  >  0  . 

To  establish  the  fact  that  /( a )  and /(c)  are  positive,  let  us  assume,  on 
the  contrary,  that  f  (x)  =  0,  for  x  =  0,  1,  2,  . ...  k  <  d,  but  f  (k  +  1) 
=  0.  Then 

(8)  flk)  -  v(S(k~1)S(k  +  a)\-  v(  0  /(*  +  fl)\ 

/U  \f(k  +  c)f(k-b)j  \/(*  +  c)  0  )■ 

Since  f  (k  +  a)  7>f(k  -f-  1)  >  0,f(k  -f  c)  2>  f  (k  +  1)  >0,  it  follows 
that  /  (A)  >  0,  which  is  a  contradiction  unless  k  —  0.  Thus /  (1)  >  0. 

We  have 

(9)  f(2)  —  V  (  /(*)  f  (a  +  2)\ 

(  }  [f(c  +  2)  / (2  —  b)j 

Since/ (1)  >  0, /(a  -f  2)  2>  /  («  +  1).  /  (c  +  2)  >/(c),/( 2  —  6)  ^  0, 
we  must  have  / (2)  >/  (1),  unless  /  (2  —  b)  =0  and  the  solution  is 
P*  —  H\  —  L  This  is  clearly  impossible  since  it  yields  /( 2)  =  0  < /( 1) 
and  we  know  that  /( 2)  i>/(l)  . 

We  thus  prove,  inductively,  that 

(10)  0-/(0)  </(  1)  </( 2)  <  ...  </(</)  =  1. 
with  strict  inequality  at  every  step. 


MULTI-STAGE  GAMES 

The  uniqueness  now  follows  readily.  Let  us  set 

(11)  T  ( p ,  q,f)  =  px  qif  (x  —  1 )  +pi  q,f(x  +  a) 

+  Pt  qif(x  +  c)  +  pt  qtf(x  —  b ) . 

Let  /  and  g  be  solutions  of 

(12)  /  ( x )  —  Min  Max  T  Ip,  q,f)  =  Max  Min  T  (p,  q,f) 

ip  pi 

g  { x )  =  Min  Max  T  (p,  q,  g)  =  Max  Min  T  (p,  q,  g) , 
t  p  pi 

for  0  <  x  <  d,  and 

(13)  /,(*)  =<?(*)  ==0,x^0 

—  1.  x  >du 

with  the  further  assumption  that  g  (x)  is  bounded  for  0  <  x  <  d. 

Under  the  assumption  that  f(x)^.g  ' x ),  set 

(14)  A  =  Max  \f(x)  —  g(x)  | , 

X 

and  let  ••  be  the  largest  integer  in  [0,  <f]  for  which  the  maximum,  assumed 
non-zero,  is  attained. 

If  we  let  pi  =  pi  (y),  qt  —  qt  (y),  pi  —  pi  (y),  qi  —  q(  (y)  besetsof  values 
for  which  the  Min  Max  =  Max  Min  is  assumed,  we  have 

<15)  /(y)  =  T(P,q,f) 

g(y)  =  T  (P,  q,  g)  , 

and,  as  in  Lemma  1, 

(16)  A  =  |/(y)  -g  (y)  |  <;  Max  [  |  T  (p,  q,f-g)  |] . 

V .  Q 

Since,  for  all  p  and  q, 

(17)  \T(p,q,f-g)\<,A, 
we  see  that  (16)  is  an  equality,  which  means  that 

(18)  T(p.q,f)  =  T(p,q,f),  ■ 

T  (p,  q.f )  =  T  (p,  q,  f) . 

Consider  the  relation 

(i(J  /(y)  —  e(y)  =  ?Mi[/(y  —  i)  —  g(y  —  i)] 

4-  Piq  i[/(y  +  c)  —  g  (y  +  0] 

+  Pi  ?«[/(y  +  «)  —  g  (y  +  a)] 

+  pz  qt \f{y  —  b)—g(y  —  &)]  ■ 

Since  27  pi  qj  =  1,  if  any  of  the  brackets  in  (19)  have  absolute  value  less 
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than  4.  the  corresponding  coefficient  pt  q,  must  be  zero.  By  assumption, 
y  was  the  largest  value  for  which  |/(y)  — g(y)  \  —  4.  Hence  ptqt  = 
0.  pi  q,  =  0. 

Since  Pi  +  Pi  =  1,  both  p,  and  p%  cannot  be  zero,  which  means  qx  = 
0  or  qt  —  0.  Turning  to  the  game  matrix 


(20) 


(f(y  —  a)  /(y  +  a)\ 
\/(y  +  c)  f(y-b)J’ 


we  see  that  the  strict  monotonicity  of  f(x)  as  a  function  of  x  makes  it 
impossible  for  qt  =  0  or  qx  =  0  to  be  optimal  play  at  x  =  y.  This  yields 
a  contradiction  to  4  >  0  and  completes  the  proof  of  uniqueness. 

We  see  then  that  the  proof  of  uniqueness  of  a  strictly  increasing  solu¬ 
tion  is  relatively  easy,  with  the  whole  difficulty  of  the  complete  unique¬ 
ness  proof  centering  about  the  proof  of  strict  monotonicity. 

The  method  we  have  employed  is  quite  general  and  applies  to  large 
classes  of  functional  equations.  It  fails,  however,  to  treat  the  general  case 
where  we  assume  only  that  the  elements  at ,  of  the  game  matrix  A  are  real 
quantities. 


§  20.  An  approximation 

Let  us  now  return  to  the  general  equation 

(1)  /  {x)  =  Max  Min  £  ptq,/(x  +  a,,) , 

p  q 

.  —  Min-Max  £  pi  q,  /  (x  an)  , 

?  v  I.  i 

and  assume  that  x  is  large  compared  to  alf. 

The  reasoning  we  shall  employ  below,  while  quite  formal,  possesses 
many  features  of  interest.  Assume  that  we  can  write 

(2)  /  (*  +  an)  +  at,/'  l*) . 

Then  (1)  takes  the  form 

(3)  /  (*)  y2  Max  Min  £  pt  q,  [f(x)  +  at,/'  (x)] 

V  1  i  i 

s/2  Max  Min  [/(*)  +  f  (x)  £  pt  q,  at,]  , 

p  i  ’J 

which  leads  to 

(4)  0  <£  Max  Min  [/'  (x)  £  pt  q,  at,] 

p  v  *.  > 

<£  Min  Max  [/'  (x)  £  pi  q,  at,] . 
q  p  > 

Assume  now  that  /'  (x)  >  0.  Then  we  obtain  the  approximate  equa¬ 
tions  for  the  unknown  distributions  p  and  q, 
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(5)  0  =  Max  Min  E  at)  Pt  qt 

p  »  i.i 

=  Min  Max  E  atJ  pt  qf , 

1  P  i.i 

an  equation  which  is  independent  of  /  (x) ! 

The  meaning  of  this  equation  is  that  for  large  x,  with  a  large  number  of 
plays  remaining  before  the  end  of  the  game,  the  play  is  approximately 
the  same  as  that  employed  in  the  single-stage  game  where  both  players 
wish  merely  to  maximize  the  expected  return  from  one  play. 

In  taking  at)  small  compared  to  x  we  are  essentially  passing  over  to  a 
continuous  version  of  the  process.  As  we  noted  in  §  18  of  Chapter  8  in  the 
discussion  of  the  nonlinear  utility  function,  the  optimal  policy  was  inde¬ 
pendent  of  the  form  of  the  criterion  function.  Here  is  another  manifesta¬ 
tion  of  this  general  principle,  and  we  shall  encounter  a  further  example  in 
§  22  devoted  to  a  similar  approximation  for  non-zero-sum  games. 

§  21.  Non-zero-sum  games — games  of  survival 

Let  us  now  turn  to  a  discussion  of  the  more  general  situation  where 
bt)  7^  —  at).  Here  there  is  no  generally  acceptable  theory  for  the  deter¬ 
mination  of  optimal  play  in  a  single-stage  process.  Consequently,  we  shall 
turn  immediately  to  the  discussion  of  a  multi-stage  process.  Let  us  assume 
once  more  that  the  players  are  both  striving  to  ruin  the  other,  and  that 
the  game  continues  until  this  occurs.  They  are  now  in  direct  opposition 
and  we  can  use  a  Min-Max  formulation. 

Since  the  game  is  non-zero-sum,  the  state  of  the  process  depends  upon 
the  fortunes  of  both  A  and  B,  x  and  y,  respectively.  Let  us  define 

(1)  /  (x,  y)  =  probability  that  A  ruins  B  when  A  has  x,  B  has  y,  and  both 

employ  optimal  policies. 

Then  f(x,  y),  provided  that  it  exists,  satisfies  the  functional  equation 

(2)  f{x,  y)  —  Max  Min  E  pt  q>f(x  atf,  y  +  M 

P  9  7 

=  Min  Max  E  pt  qif{x  4-  at),  y  -f  bt))  , 
q  v  i.i 

with  the  boundary  conditions 

(3)  f{x,y)  -  l.x^O.y  <0 

=  0,  x  <;  0,  y  >  0 
=  1/2,  x  =  y  =  0  (by  convention)  . 

It  is  easy  to  establish  the  following  result,  using  the  methods  we  have 
employed  above. 
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Theorem  6.  If  an  -f-  bn  <  0  for  all  i,  j,  there  is  a  unique  bounded  solution 
to( 2).  (3). 

§  22.  An  approximate  solution 

Let  us  assume  that  we  are  dealing  with  a  process  where  an  and  bn  are 
always  negative.  Then  assuming  that  x  and  y  are  large  compared  to  an 
and  bn,  and  that  we  may  write 

(1)  f(x  +  an,  y  +  bn)  i of{x,  y)  +  atifx  +  bt,fy , 
we  obtain  the  approximate  equation 

(2)  f(x.  y)  Max  Min  £  pt  q ,  [f{x,  y)  -f  a(ifx  +  bi}fy ) 

V  t  i.i 

sfi  Min  Max  £  pi  q,  [/(*,  y)  -f  at,fx  +  bn  fy ] . 

«  V  i,  I 

From  this  we  obtain  the  approximate  equations 

(3)  0  =  Max  Min. [/*  £  an  pt  qi  +  fy  £  b(j  pt  qi\ 

v  «  i  i 

=  Min  Max  [fx  £  a()  pi  qi  +  fy  £  bn  Pi  qi]  • 

?  p  i,  i  t.  i 

Using  the  same  reasoning  employed  in  §  4  of  Chapter  9,  we  see  that  these 
yield 

(4)  —  fx'Jy  —  Max  Min  [  £  bn  p<  qjl  £  a(j  pi  qi ] 

p  i  i.  i  i,  i 

—  Min  Max  [  £  bn  pi  qil  £  an  pi  qi]  • 

i  p  >'.  i  i 

This  is  a  very  reasonable  criterion.  Observe  that  it  makes  no  difference 
whether  we  solve  for  fx—fy  or  fy\fx,  since  maximizing  fxjfy  is  equivalent 
to  minimizing  fy/fx- 

In  the  next  section  we  shall  demonstrate  that  Max  Min  in  (4)  actually 
equals  Min  Max. 

§  23.  Proof  of  the  extended  min-max  theorem 

In  this  section  we  wish  to  prove 

Theorem  7.  If  £  bn  pi  qi  d  >  0  for  all  distribution  vectors  p  and  q, 
',  i 

then 

£  an  pi  qi  £  an  pi  qi 

(1)  Max  Min  - =  Min  Max  'jl- - — —  . 

p  q  2*  b(j  Pi  (fo  q  p  2,  bi)  Pi  (fa 

i,  j  it  j 
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Proof.  There  is  no  loss  of  generality  in  further  assuming  that  bn  <,m  < 
1  for  all  t,  j  so  that  2  bn  Piq>  <.m  for  all  relevant  p  and  q.  Consider  the 
i.i 

system  of  recurrence  relations 

(2)  u0  =  Max  Min  27  an  pi  qi  —  Min  Max  Z  an  Pi  qi . 

p  t  i.i  f  p  i.i 

Un  + 1  =  Max  Min  [  27  atJ  pt  qi  - f  [1  —  27  bn  pi  qi]  un] 

P  1  i.i  i 

=  Min  Max  [  27  at)  pi  q>  ,-f  [1  —  27  b()  pi  qi\  «»] . 

f  P  '.i  i.  i 

It  is  easy  to  show,  using  the  methods  discussed  above,  that  the  se¬ 
quence  {«„}  converges  to  a  value  u,  satisfying  the  equation 

(3)  u  =  Max  Min  [  2’  an  /><?*  +  [!  —  27  bn  pt  qi]  m] 

p  «  <.  i  i.  i 

=  Min  Max  [  2  ai}  ptqi  +  [  1  —  27  b,/  pi  qi]  m]  . 

9  P  i.  i  i.  i 

The  condition  0  <  1  —  E  bn  Pi  qt  <  \  —  d  yields  geometric  convergence 

00  I,  j 

of  the  series  2  (un  + 1  —  u„). 

n  —  0 

Since  u  satisfies  (3),  it  is  easy  to  see  that  it  is  given  by  the  expression 

(4)  u  -=  Max  Min  2  atl  pt  q^\  Z  bn  pi  q / 

p  i  i.i  i.i 

—  Min  Max  2  an  p(  q,l  2  bi}  pi  qi , 

9  P  i.  i  i.  i 

which  establishes  the  theorem. 


§  24.  A  rationale  for  non-zero  sum  games 

The  importance  of  the  above  result,  combined  with  the  approximation 
procedure  discussed  in  §  14,  is  that  we  now  have  a  possible  rationale  for 
the  play  of  non-zero  sum  games,  namely  one  based  upon  the  criterion 
function 


(1) 


2?  (P.  q) 


Z  at]  pi  qi 
Z  bn  pi  qi 


Whether  or  not  to  accept  this  is  a  matter  of  individual  taste.  It  must  be 
realized  that  this  question  must  always  arise  in  two-person  processes, 
where  it  is  not  a  priori  evident  that  both  individuals  are  employing  the 
same  criterion  function,  or,  what  is  worse,  they  may  not  have  commen¬ 
surable  utility  scales. 
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Exercises  and  research  problems  for  Chapter  X 

1.  Consider  the  following  game.  Two  players,  I  and  II,  match  coins  ac¬ 
cording  to  the  following  rules: 

a.  I  and  II  both  lose  one,  if  a  head-head  combination  occurs,  _ 

b.  I  gains  one,  II  loses  one  if  a  tail-tail  combination  occurs, 

c.  I  loses  one,  II  gains  one  if  head-tail,  tail-head  occur. 

The  first  player  starts  with  a  quantity  tn  and  the  second  player  with  a 
quantity  n.  Each  plays  so  as  to  ruin  the  other.  Let  p  (tn,  w;  x,y)  be  the 
probability  that  I  will  be  ruined  before  or  together  with  II  if  I  shows 
heads  with  probability  x  and  II  shows  heads  with  probability  y. 

Define 

qi  =  xy  =  Probability  that  I  and  II  both  display  heads 
qt  —  x  (1  —  y)  +y(l  —  x)  =  Probability  that  a  head-tail  combination 
appears 

qx  —  (1  —  x)  (1  —  y)  =  Probability  that  both  I  and  II  display  tails. 
Obtain  the  recurrence  relation 

p(m,n)  =  qip[m  —  1,  »  —  1)  +  qt  p  (tn  —  1,  w  +  1) 

+  q3p(m  +  1,  n  +  1) , 

for  tn,  ti  ^  1,  with  the  boundary  conditions 

p  (tn,  0)  =  0,  tn  ;>  1,  p  (0,  w)  =  1,  ;>  0 

(P..  Bellman  and  D.  Blackwell) 

2.  Show  that  for  «  ;>  2  we  obtain  the  finite  set  of  equations 

P  (1.  n)  =  (qx  -f-  qt)  -f  q3p  (2,  n  --  1) 

P  (2,  n  —  1)  ==  qxp(l,n  —  2)  +  q2  p  (1,  n)  +  q3p  (3,  n  —  2) 


p  (n  —  1,  2)  =  qx  p  (n  —  2,  1)  +  qt  (n  —  2,  3)  +  q3  p  («,  1) 
p(n,  1)  =  qxp  (n—  1,  2) 


3.  Show  that 


P(  2,  1)  = 


(?i  +  qt)  qt 
i  —  qt  qt 


and  hence  that  Min  Max  p  ( m ,  n,  x,  y)  ^  Max  Min  p  (m,  n,  x,  y)  in  gene- 

-r  U  y  x 

ral.  (It  is  interesting  to  note  that  Min  Max  ^  .4397,  x'  =  .43,  y'  =  .5, 

■T  y 

Max  Min  iQ  .4304,  x'  —  .43,  y'  =  3,  where  x'  —  1  — x,  y'  =  1  — y. 
y  ^ 
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4.  It  follows  from  the  fundamental  min-max  theorem  for  continuous 

games  that  Min  Max  K{m,  n;  A,  B)  =  Max  Min  K{m,  tt;  A,  B),  where 
a  n  HA 

K  (m,  tt;  A,  B)  =  J*  p(m,n; x,  y)  dA  (x)  dB  (y ), 

and  A,B  range  over  the  space  of  monotone  functions  of  uniformly 
bounded  variation  equal  to  1.  Show  that  the  solution  for  ttt  —  2,  n  =  1 
is  given  by  — 

a.  II  chooses  the  value  of  y',  y0,  for  which  p  (2,  1,  0,  y')  =  p  (2,  1,  1,  y'), 
a  pure  strategy. 

b.  I  chooses  a  mixed  strategy,  using  either  all  heads  or  all  tails  in  the 
combination  (a,  1  —  a),  where  a  is  chosen  so  that  ap  (2,  1,  0,  y')  + 
(1  —  a)  p  (2,  1,  1,  y')  has  a  maximum  at  y'  =  y0. 

5.  Show  that  the  expected  probability  of  ruin  for  I  is  y0  .4302,  the 
unique  real  root  of  y'  =  (1  —  y ')*/(l  —  y'  -f  y'*). 

6.  Prove  that  as  m,  n  — v  oo  along  any  fixed  direction,  m/n  =  r,  player  II 
can  choose  y  so  that  uniformly  in  x  we  have 

lim  p  (tn,  n)  —  1 . 

tn,  n  —  ►  oo 

7.  Show  that  the  above  considerations  lead  to  the  following  principle:  In 
playing  a  game  of  this  type,  I  should  try  to  make  the  stakes  as  high  as 
possible,  whereas  II  should  try  to  make  them  as  low  as  possible. 


8.  Let 


fN  («,,  u, . uN  \  Vi,  v„  . .  . ,  vs")  =  Min  Max  [  E  ai}  xi  y} 

v  *  i,  i  -  1 

.V  N 

+  E  ut  xi  +  E  vj  yi)  =  Max  Min  [...],  AT  =  1,  2,  .  .  . 

»  -  1  >  1  -r  V 


Derive  a  recurrence  relation  for  {fx}. 


9.  Consider  the  game  of  survival  described  by  the  matrix 


where  the  total  fortune  of  both  players  is  4  and  k  describes  the  fortune  of 
the  first  player.  Show  that  f(k),  the  probability  of  survival  of  the  first 
player,  satisfies  the  equations 


/(l)  =/(3)  +/( 2)/(/(2)  +/(3)) 

/( 2)  =  /  (3)  /  ( 1  +/(3) —/(!)) 

/  (3)  =  (1  — /(2)/(l)/(2— /(2)  —/(!))  (Hausner) 
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10.  Hence  show  that 


/(l)  =  1  -  V2/2,/(2)  =  1/2,/ (3)  =  \/2/2 , 
and  that  the  corresponding  optimal  strategies  are  given  by 

pi  =  \/2  —  1,  />.  =  1/2,  p,  =  2  —  V2 
?!  =  \/2  —  1,  ?.  =  1  —  V2/2,  ?,  =  \/2  —  1 
11.  Consider  the  game  of  survival  described  by 


where  a  and  b  are  positive  integers.  Let  vn  (k)  be  the  probability  that  A 
survives  when  the  fortunes  of  both  players  total  n  and  A  possesses  k  of 
this.  Show  that 


v„  +  1  {k  -f  1)  =  V„  ( k )  +  (1  —  V„  ( k ))  Vn  +1  (1) 


12.  Show  that 


I)*  +i  (1)  =  - 


v»+i  (1  +  a)  vn  +  i  (1  +  6) 


and  hence  that 


13.  Show  that 


Vn  +  1  (1  -f-  a)  -t-  Vn  +1  (1  -f-  b) 


/1V  y/ Vn  (a)  Vn  (b) 

Vn  +  l  (1)  —  -  -  - 

1  +  Vv„  (fl)  Vn  (b) 


Ml)  = 


Vn  (1  +  b)  _  _Vn(l)_ 

Vn  (1  -(-  a)  -f-  Vn  (1  -f-  b)  Vn  (1  -f-  a) 

pn  +1  {k  1)  —  pic  (k)  =  pn  -  *  +  1  (1) 


14.  Prove  Theorem  5  by  showing  that  val  (A  — B  X)  is  a  continuous 
function  of  A  which  is  monotone  decreasing  as  a  function  of  A.  Hence 
show  that  there  is  exactly  one  solution  of  the  equation  val  ( A  —  BX)  —  0 
which  may  be  represented  in  the  form  given  in  (23.1).  (Karlin). 

15.  Consider  the  equation 

u(p)  =  L  (k  {p,  q,  q '))  +  a  (p,  q,  q')  , 
and  the  related  equation 

v  {p)  —  Max  Min  [L  (v  [p,  q,  q'))  -f  a  (p,  q,  q')} 

=  Min  Max  [  ...  ] . 
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Under  what  conditions  may  we  write 

v  (p)  -=  Max  Min  u  (p)  =  Min  Max  u  (p)  ? 

it-  ft 

16.  Consider,  in  particular,  the  system  of  equations 

ft 

xi  —  Max  Min  [c<  (q,  q')  +  E  at)  ( q ,  q')  x}] , 
t  r  i  - 1 

n 

=  Min  Max  [c<  ( q ,  g')  -f  (q,  q')  xf],  *  =  1,2,  ....  n. 

<t  t  i  - 1 

under  appropriate  conditions  concerning  the  matrix  A  (q,  q')  = 

(««i  (?>  ?'))•  (L.  Shapley) 

17.  Suppose  that  we  are  given  the  information  that  a  coin  has  a  fixed  but 
unknown  probability  p  of  landing  heads  and  a  probability  q  =  1  —  p  of 
landing  tails,  and  that  p  has  a  known  a  priori  distribution  function  dF  (/>). 

The  coin  is  to  be  tossed  N  times  and  we  are  to  call  heads  or  tails  before 
each  toss  with  the  full  knowledge  of  the  results  of  the  previous  tosses. 

What  policy  maximizes  the  expected  number  of  correct  calls  ? 

18.  Suppose  that  we  can  toss  the  coin  as  many  times  as  we  please,  at  a 
cost  of  c  per  toss,  and  then  are  required  to  furnish  a  value  for  p,  the  pro¬ 
bability  of  heads.  If  p'  is  the  value  decided  upon,  the  cost  of  deviation 
from  the  true  value  is  g  (p  —  p'),  where  g  is  a  known  function.  What 
policy  minimizes  the  total  expected  cost  ? 

19.  Returning  to  problem  17,  suppose  that  an  opponent  has  the  choice 
of  choosing  F  (p)  so  as  to  minimize  the  expected  number  of  correct  calls 
obtained  using  an  optimal  policy.  Can  one  characterize  the  optimal 
selection  of  F  (p)  by  the  statement  that  the  opponent  chooses  F  (p)  in 
such  a  way  as  to  minimize  the'information  available  after  any  finite  set  of 
tosses  ?  On  this  hypothesis,  determine  Min-Max. 

20.  Generalize  these  results  to  cases  where  there  are  many  different 
possible  outcomes  at  each  stage,  e.g.  a  six-sided  die. 

21.  Player  A  has  resources  in  quantity  x,  and  Player  B  resources  in 
quantity  y.  A  divides  x  up  into  n  parts,  x  =  E  xt,  xt  0,  and  B  likewise, 

i 

y  =  E  yt,  yi  >  0.  The  payoff  to  A  is 

t 

.n 

P  {x,  y)  =  E  Ci  Max  (x<  —  yi,  0)  , 

<  -  i 

and  the  negative  of  this  to  B. 
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Write 

/«  (*.  y)  =  Max  Min  [/  P  (*.  y )  dG  (*i.  x . Xn)  dG'  (y„  y„  ....  y«)] 

a  <r 

=  Min  Max  [  . . .  ]. 

o'  a 

Obtain  the  recurrence  relations  connecting/*  and /»  _  i.  (Colonel  Blotto) 

22.  Let  ^4  be  a  positive  matrix,  i.e.  an  >  0  for  all  i,  j.  Show  that  A  has  a 
unique  characteristic  root  of  largest  absolute  value,  which  is  positive, 
and  that  the  associated  characteristic  vector  may  be  taken  to  be  positive. 
Denote  this  root  by  p  {A),  the  Perron  root  of  A. 

23.  Show  that 

n 

P  (A)  —  Max  Min  £  ait  xrfxi , 

x  i  i  -  1 
n 

=  Min  Max  £  an  xjjxt , 

x  i  j  —  I 

where  the  variation  is  over  the  region  xt  0,  £  xi  =  1. 

t 

24.  Show  that 


p  {A)  —  Max  Min  £  an  x//xi , 
h-  if-  i 

M 


=  Min  Max  £  an  xjjxi , 

K  i  7  —  1 

where  R'  is  defined  by  xi  ^  d,  £  xi  —  1 ,  and  d  may  be  taken  to  be 

\ 

d  =  Min  ai//Max  {£  ai}) . 
i.  i  <  i 

25.  Prove  that  p  [A)  is  the  unique  solution  of 

n 

A  =  Max  Min  [  £  ai}  Xf  +  A  (1  —  *<)] , 

K  i  i  -  1 

or  of 


n 

A  =  Min  Max  [  £  an  Xj  +  A  (1  —  xi)] , 

ic  i  j  -  i 


where  R'  is  as  above. 

26.  Consider  the  nonlinear  recurrence  relation 


n 

«n  + 1  =  Min  Max  [  £  ai}  xj  +  u„  (1  —  *<)]  . 

If  i  j  -  1 

with  M0  arbitrary.  Prove  that  p  {A)  =  lim  un. 

n  — *■  oo 

(Proc.  Artier.  Math.  Soc.,  1956) 
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theorem  and  the  original  proof  are  due  to  Von  Neumann. 


CHAPTER  XI 


Markovian  Decision  Processes 

§  1.  Introduction 

In  this  chapter  we  shall  study  some  decision  processes  of  a  different 
form  than  those  previously  encountered,  giving  rise  to  a  new  class  of 
functional  equations. 

We  shall  consider  discrete  processes,  which  lead  us  to  the  study  of  the 
difference  equation 

.v 

(1)  xt  (n  +  1)  =  Max  E  a(j  (q)  xf  (n),  x(  (0)  =  ct,  i  =  1,  2,  . . N , 

1  ;  -  1 

and  some  continuous  processes  which  generate  the  equation 

.v 

(2)  dxijdt  =  Max  E  a(j  (q)  xj  (/),  x(  (0)  =  c{,  i  —  1,  2,  . . N  , 

q  i  =■  i 

in  the  one-person  case,  and  the  equation 

.v 

(3)  dxi/dl  =  Max  Min  [  E  atJ  (p,  q)  xj  (/))],  x(  (0)  =  a,  i  —  1,  2,  ....  N, 

q  p  j  -  l 

=  Min  Max  [...], 
p  q 

in  the  two-person  case. 

As  we  shall  see,  equations  of  this  type  have  connections  with  the  clas¬ 
sical  theory  of  differential  and  difference  equations.  We  shall,  however, 
reserve  any  detailed  exploration  of  this  liaison  until  the  second  volume. 

§  2.  Markovian  decision  processes 

Let  us  describe,  in  this  section,  a  decision  process  which  motivates  the 
study  of  a  class  of  nonlinear  difference  equations,  of  which  (1.1)  is  a  re¬ 
presentative.  We  shall  then  consider  the  limiting  form,  namely  (1.2). 

Consider  a  physical  system  S  which  at  any  of  the  times  t  =  0,  A, 
2^1,  ...  must  be  in  one  of  a  set  of  states  which  we  denote  byS„  St,  . . ., 
Sjv.  Assume  that  at  any  time  t  there  is  a  probability  xi  <t)  that  ihe  system 
is  in  the  *,h  state,  and  that  transition  probabilities  exist  governing  the 
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changeover  from  one  state  to  another.  It  is  important  to  realize  that 
these  are  very  strong  assumptions  concerning  the  nature  of  the  system. 
Let 

(1)  at,  —  the  probability  that  the  system  will  be  in  state  *  at  t  +  A  if  it 

is  in  stare  at  time  t. 

The  relation  between  the  set  of  probabilities  {*<  (t  4  A)}  and  the  set 
{xt  (/)}  is  then  given  by  the  relations 

(2)  xt(t  +  A)  =  T  at)X,(t),i  =  1,2 . N , 

i  -  l 

for  t  =  0,  A,  2  A, .  Setting  xt  (n  A)  =  yi  («),  we  may  write  these 

equations  in  the  simpler  form 

(3)  yt  («  +  1)  =  -  a{)  yj  (n),i  =  1 ,  2.  ....  N . 

i  -  * 

The  asymptotic  behavior  of  the  state  vector  (y,,  ys,  . . .,  y n)  as  /  ->  co 
is  determined  by  the  algebraic  character  of  the  characterisitic  roots  of 
the  matrix  A  =  (at)).  A  process  of  this  type  is  called  a  Markoff  process. 
There  exists  an  extensive  mathematical  theory  of  these  processes. 

Let  us  now  consider  Markovian  decision  processes.  Assume  that  the 
transition  probabilities,  at,,  depend  upon  a  parameter  q,  which  may  be  a 
vector,  and  that  at  each  stage  of  the  process  a  is  to  be  chosen  so  as  to 
maximize  the  probability  that  the  system  is  in  the  state  S,.  In  place  of 
the  equations  in  (3)  we  obtain  the  nonlinear  system 

x 

(4)  y,  (n  +  1)  =  Max  I  al}  (q)  y,  («) , 

v  9  1 

yt  (n  4  1)=-  at)  (q*)  yf  ( n ),  i  =  2,  3 . N  . 

i  -  i 

where  q *  =  q *  ( n )  in  the  remaining  N  —  I  equations  is  one  of  the  values 
of  q  which  maximizes  y,  (n  4  1)- 

Since  the  at)  are  transition  probabilities,  they  are  restricted  by  the 
conditions 

(4)  at)  ^0,  E  a,,  =  1,  j  =  1,  2 . N. 

i 

for  all  q. 

To  obtain  more  general  equations,  consider  the  situation  in  which  we 
have  N  different  types  of  items  and  let  xt  (t)  represent  the  quantity  of 
the  t‘h  item  at  time  t.  These  items  have  the  property  that  a  unit  quan¬ 
tity  of  the  ith  item  generates  an  amount  at,  of  the  ;th  item  over  the  time 
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interval  [ t ,  l  +  A).  Here  an  >  0  represents  production,  and  the  reverse 
inequality  represents  consumption.  Once  again  let  an  depend  upon  a 
parameter  q  and  let  the  purpose  of  the  process  be  to  maximize  the 
quantity  of  the  first  item  available  at  any  time.  In  this  case  we  obtain 
the  equation  in  (4)  with  no  restriction  on  the  magnitude  or  sign  of  the  an. 

In  the  limit  as  A  -*■  0,  we  obtain  in  place  of  (4)  the  nonlinear 
differential  system 


2,  3 . N. 

To  obtain  this  system,  we  set,  in  the  usual  fashion 

(6)  a()  =  bn  A,  i  ^  j 

an  —  1  —  bu  A  , 

and  then  let  A  ->  0.  Having  obtained  the  equations  by  means  of  this  for¬ 
malism,  we  now  define  the  continuous  process  by  means  of  the  equation 
in  (5).  In  return  for  this,  we  must  establish  existence  and  uniqueness  of 
solutions,  which  is  to  say  we  must  show  that  this  method  of  defining  a 
process  is  actually  valid. 


(5) 


—  =  Max  L  bn  ( q )  x ,  (l),  xl  (0)  =  c, , 


if-  i 

£  bn(q*)xiU),xl(0)=a,  i 
at  j  _  i 


§  If.  Notation 


Taking  account  of  the  foregoing  remarks,  we  shall  begin  by  considering 
the  continuous  version  first.  Introducing  vector-matrix  notation  to  sim¬ 
plify  our  notation, 


the  system 


(2)  -j—  =  Max  L  an  (q)  x,,  x({0)  =  c(,  i  =  1,2,  ...,  N  , 

at  q  f  _  i 

takes  the  form 


(3) 


dxidt  —  Max  A  (q)  x,  x  (0)  =  c  . 
v 
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where  it  is  understood  that  the  maximum  is  taken  element  by  element. 
By  this  we  mean  that  the  set  of  q’s  for  each  row  is  distinct  from  the  cor¬ 
responding  set  for  any  other  row.  Thus, 

(4)  (?)  =  «ii  (?n.  ?i . .  qlk) , 

ati  (?)  =  ati  (?n>  ?»»•  •  •  •»  ?t *)  » 


<*N)  (?)  =  &N]  (qs  1,  ?JV|,  .  .  . ,  ?N*)  , 

so  that  there  is  no  interaction  between  the  various  maximizations.  After 
discussing  this  case,  we  shall  return  to  the  equations  obtained  in  the 
preceding  sections,  where  interaction  occurs. 

It  is  convenient  to  employ  the  notation 

(5)  11*11=  £|*«|. 

«  -  i 

I  \A  |  |  =  T  |  at,  | 

i.i-  I 

These  fulfill  the  usual  requirements  for  norms,  and  in  addition  we  have 
(10)  \\Ax\\<Z\\A  ||||x||. 


§  4.  A  lemma 

As  is  usual  in  the  theory  of  differential  equations,  the  first  step  in  esta¬ 
blishing  existence  and  uniqueness  of  a  solution  consists  of  converting  the 
differential  equation  into  a  suitable  integral  equation.  This  enables  us  to 
take  advantage  of  the  smoothing  properties  of  integration. 

Considering  the  more  general  equation 

(1)  dxjdt  =  Max  [A  ( q ,  t)  x  4-  b  (q,  /)],  x  (0)  =  c, 
we  obtain  the  integral  equation 

(2)  x  —  c  +  f  Max  [ A  (q,  s)  x  -f  b  (?,  s)]  ds 

Jo  n 

which  may  be  written 

(3)  x  =■  Max  [c  +  I"  b  ( q ,  s)  ds  -f  f  A  (q,  s)  xrfs] . 

q  JO  JO 

Since  q  is  a  function  of  t,  pointwise  maximization  yields  global  maximi¬ 
zation. 

It  is  easy  to  demonstrate  the  following  result  in  much  the  same  way  as 
Lemma  1  of  Chapter  2  was  established. 
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Lemma.  Let 


(4) 

Tx(x)  =  Max  [6,  {q,  t)  -f  f 
i  J* 

1 

A  (q,  s)  xds] , 

► 

Tt  (y)  =  Max  [i,  (q,  t)  +  f 

m  J  C 

* A  (?.  s)  yds] . 

the.T% 

f  v* 

(5) 

1  1  T x  (X)  -  r,  (y)  1  |<£  Max  [  I  |  b , 

(?.  t)  —  bt  (q,  t) 

+  f‘  I  I  A  (q,  s)  |  |  |  |  x  —  y  |  |  ds 

This  lemma  will  he  the  fulcrum  of  our  existence  and  uniqueness  proof. 

§  5.  Existence  and  uniqueness — I 

Let  us  now  consider  the  question  of  the  existence  and  uniqueness  of 
solutions  of  the  equation 

(1)  dxjdt  =  Max  [A  (q,  t)  x  -(-  b  ( q ,  /)],  x  (0)  =  c  . 

9 

There  are  a  number  of  cases  of  particular  interest,  corresponding  to 
different  assumptions  that  can  be  made  concerning  the  function  A  (q,  t), 
b  (q,  t ),  and  the  set  of  admissible  functions  q  (t).  We  shall  discuss  one  class 
of  equations  and  leave  the  matter  there,  since  the  method  used  will  illus¬ 
trate  the  procedures  that  may  be  employed  in  other  cases. 

Our  first  result  is 

Theorem  1.  Assume  that  q  is  an  element  of  a  set  of  functions  S  with  the 
■ property  that 

(2)  \\A{q,  t)  |  I.  |  \b(q,t)  |  |  <£/(<)  . 

where  f  ( t )  is  integrable  over  any  finite  interval  0  <Ct  <,  T.  Assume  further 
that  the  maximum  of  A  (q,  t)  x  -f-  b  (q,  t )  is  attained  for  q  e  S  for  any  fixed 
t  and  x  values.1 

Then  there  is  a  unique  solution  to  (1)  satisfying  the  equation  almost  every¬ 
where.  This  solution  may  be  found  as  the  limit  of  the  successive  approxima¬ 
tions, 

(3)  x0  —  c  , 

xn  +  i  =  r  +  Max  [J  [A  (q.  s)  xn  +  b  (q,  s)]  ds,  it  —  0,1.  ... 

q  JO 

1  The  purpose  of  this  assumption  is  to  handle  simultaneously  the  case  where  q 
assumes  only  a  discrete  set  of  values,  in  which  case  the  maximum  is  always  attained, 
and  the  case  where  q  varies  continuously. 
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Proof.  Let  us  first  show  that  the  xn  are  uniformly  bounded  in  the  inter¬ 
val  [0,  T ].  Specifically,  we  shall  show  that 

(3)  I  I  *»  I  I  ^  a  exp  (  JJ/(s)  *) . 
where 

(4)  a=\\c\\  +  £f(s)ds. 

The  inequality  certainly  holds  for  n  =  0.  Assume  that  it  is  valid  for 
k  =  0,  1,  . . n.  Then  we  have,  from  (3), 

(5)  |  |  *n  + 1 1  |  ^  |  |  e  |  |  +  P  Max  |  |  b  (q,  s)  |  |  is 

JO  q 

+  P  (Max  I  M  (?-  S)  I  I)  I  I  Xn  |  |  ds 

Jo  q 

^  +  JV(S)  |  \Xn  |  |  ds 

Replacing  |  |  xn  |  |  by  its  bound,  we  have 

(6)  |  1  *n  +  l  M  <1  a  +  C'/(s)[«pxP(  f'/(.,)*i)]* 

Jo  Jo 

and  thus  obtain  the  same  bound  for  |  |  x„  +i  |  |. 

Let  us  now  establish  the  convergence  of  the  sequence  {*„}.  Applying 
the  Lemma  of  §  4  to  the  two  relations 

(7)  xn  +i  —  c  +  Max  [  f  [A  ( q ,  s)  xn  +  b  ( q ,  s)]  is] , 

q  Jo 

xn  —  c  -f-  Max  [  f  [A  (q,  s)  xn  -  i  +  b  ( q ,  s)]  is] 

q  Jo 

we  obtain  the  inequality 

(8)  I  I  Xn  + 1  —  Xn  [  |  <;  Max  f  |  |  A  [q,  s)  |  |  |  |  x„  —  x„  _  i  |  |is 

q  JO 

<,  j  /  (s)  I  I  Xn  —  Xn  -  1  I  I  is,  n  =  1,  2,  .  .  . 

Iterating  this  relation,  starting  with  the  inequality  for  |  |  x,  —  x0\  |, 
we  obtain  the  inequality 

(9)  1 1  *„  +  !  —  *»  |  Is:  (|  Mi  +  i)(JV(s)  *)-«/(»  +  i)i 
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which  establishes  the  uniform  convergence  of  the  sequence  { x» }  in  the 
interval  [0,  7~]  to  a  function  x  (/).  This  function  is  continuous  for  0  <T  t 
<T  T,  satisfies  the  integral  equation 

(10)  x  (l)  =  c+  1“  Max  [A  ( q ,  s)  x  +  6  (q,  s)]  ds, 

Jo  t 

and  hence  the  differential  equation  almost  everywhere. 

Finally,  let  us  demonstrate  uniqueness.  Let  y  [t)  be  another  solution 
of  the  equation,  existing  in  some  interval  [0,  S].  Then  in  this  interval  y  (l) 
satisfies  the  equation  in  (10).  Applying  the  lemma  of  §  4,  we  derive  the 
inequality 

(H)  I  I  *  W  —  V  (0  I  I  ^  Max  I**  I  \A  (9-  *)  I  I  I  I  x  —  y  |  |  ds 

q  Jo 

<■  JV(* )  1 1 x  —  y  1 1  ds 

This  inequality  has  the  form  v 

(12)  u  (t)  ^  JV(s)  u(s)  ds, 

where  f{s),u  (s)  2  0. 

Hence,  for  an  arbitrarily  small  positive  constant  a,  we  have 


(13) 


u  (<)  <,  a  -f-  J  /  (s)  u  (s)  ds  . 


Dividing  through,  this  yields 

/(/)  M  (/) 

d4)  .  ,  P 


a  +  j  /(s)  M  (s)  ds 


-^/(0 


Integrating  between  0  and  s,  we  have 


a  -j-  J  /  (s)  u  (s)  ds  <,  a 


Combining  this  with  (13),  we  obtain  the  inequality 
(16)  u  (t)  a  eJ° 

Since  a  is  an  arbitrary  positive  constant,  we  see  that  u  ( t )  =  0. 
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An  alternative  proof  proceeds  as  follows.  It  is  clear  that  a  constant  6 
exists  such  that  |  |  x  —  y  j  |  <;  b  in  [0.  s].  Hence 

(17)  «(/)<:  6  J7(s)rfs. 

Use  this  inequality  on  the  right  side  of  (12),  obtaining 

(18)  u(t)<.b  J‘  (/(s)  J‘/(s,)  ds,)  ds  =  6/2 1  (/7(s)  ds  j  * 

Continuing  in  this  way,  we  have  for  each  n  =  1,2,...,  the  inequality 

l,n  /ft  \  n  + 1 

(19)  -(J>*) 

Letting  n  ->  oo,  we  see  again  that  u  (t)  =  0  . 


§  6.  Existence  and  uniqueness — II 

Let  us  now  consider  the  equation  of  (2.5).  In  general,  equations  of  this 
general  type  need  not  have  unique  solutions,  due  to  multiplicity  of 
maximizing  g-values.  Consider,  for  example,  the  equation 

(1)  dx,jdt  =  Max  [1  —  qx  (1  —  q )*]  +  xt,  x,  (0)  =  0 

dxt/dt  —  q*  xt  ,  xt  (0)  =  1 

Since  q*  —  0  or  1 ,  we  obtain  infinitely  many  sets  of  solutions,  of  which  the 
following  are  representative 

(2)  x,  =  2  t,Xl  =  t  +  (e‘  —  1) 
x,  =  1 ,  r,  =  el . 

We  can,  however,  obtain  uniqueness  theorems  if  we  restrict  ourselves 
to  solutions  obtained  in  the  following  way.  First  solve  the  equations 

.v 

(3)  dxt/dl  =  b,  (q,  t)  +  E  av  {q)  xu  x,  (0)  =-  c, , 

j  -  l 
.V 

dxN/dt  —  bN  (q,  t)  +  H  aNj  (q)  x]t  xN  (0)  =  cN  , 
i  -  i 

for  the  quantities  xlf  x3,  . . .,  xjv  in  terms  of  function  x„  regarding  q  for 
the  moment  as  some  unknown  function. 

Each  x*,  A  =  2,  3,  . .  .,  N,  will  have  the  form 

(4)  xk  =  uk{q,  0  +  j"  vk{q,t,  s)  xl[s)  ds 
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Substituting  these  expressions  into  the  equation 

(5)  dxjdt  =.  Max  [6,  (q,  t)  +  I  a„  (q)  xj],  xt  (0)  =  c, , 

»  ;  -  i 

we  obtain  an  equation  of  the  form 

(6)  dxjdl  =  Max  [b  (q,  t)  +  au  (q)  xt  -f  f  v  (q,  t,  s)  xt  (s)  ds] . 

q  Jo 

This  equation  we  write  in  the  form 

(7)  x,  =  e,  +  Max  [  J  b(q,  s)  ds  +  J au  (9)  x,  is  + 


e  (9,  t,  s)  x,  (s)  ds]  d/,] 


Using  the  methods  employed  in  §  5,  it  is  easy  to  establish  the  existence  of 
a  unique  solution  of  this  equation  under  the  hypotheses  of  Theorem  1. 


§  7.  Existence  and  Uniqueness — III 

It  is  possible,  using  the  same  technique  of  successive  approximation 
and  inequalities,  to  establish  existence  and  uniqueness  theorems  for 
more  general  systems  of  differential  equations  of  the  form 

(1)  dx\dt  —  Max / (x,  q,  t),  x  (0)  =  c  . 

1 

Since  these  results  are  more  within  the  province  of  differential  equations 
than  pertinent  to  the  theory  of  decision  processes,  we  shall  leave  it  for 
the  ambitious  reader  to  frame  his  own  analogues  of  the  classical  existence 
and  uniqueness  theorems. 


§  8.  The  Riccati  equation 

Although  we  do  not  wish  to  penetrate  too  deeply  here  into  the  study  of 
this. class- of  nonlinear  differential  equations,  the  following  result  seems 
particularly  worthy  of  notice. 

The  change  of  variable 

(1)  V  —  u'ju 

converts  the  general  second  order  linear  differential  equation 

(2)  u  +  p  (()  u'  -f  q  {t)u  —  0  , 
into  the  first  order  non-linear  equation 

(3)  v'  +  v*  +  p(l)  v  +  ?(<)=  0. 
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N 

This  equation  is  called  a  Riccati  equation.  It  is  clear  from  the  foregoing 
that  the  general  solution  of  (3)  is  equivalent  to  the  general  solution  of  (2), 
and  hence,  in  general,  cannot  be  obtained  explicitly  in  terms  of  quadra¬ 
tures. 

Let  us  now  show  that  (3)  can  be  interpreted  to  be  an  equation  of  the 
general  class  exhibited  above.  We  begin  with  the  observation  that 

(4)  —  v*  —  Min  (w*  —  2  wv) . 

i r 

Hence  (3)  may  be  written 

(5)  v'  =  Min  [ w *  —  2  wv  —  p  (t)  v  —  q  (/)] , 

W 

where  w  now  varies  over  all  functions  of  t. 

For  fixed  w,  let  V  (w,  t)  represent  the  solution  of 

(6)  V'  =  w1  —  2  wV  —  p  (t)  V  —  q  (l) , 

fixed  by  the  condition  V  (0)  =  v  (0)  =  c.  This  solution  has  the  explicit 
representation 


(7)  V  =  ceK  + 


q  (s)) 


J*  <PC,)i 


2tr)  dtx 


ds . 


obtained  in  the  usual  way  by  means  of  an  integrating  factors. 
Let  us  now  show  that 


(8)  v  =  Min  V  ( w ,  t) . 

U’ 

For  an  arbitrary  function  w  —  w  [ l ),  we  have 

(9)  v'  <,  U'*  —  2  wv  —  p  (t)  v  —  q  ( t )  , 

which  shows  that  v  <,  V  (zv,  t).  Hence  v  <Z,  Min  V  ( w ,  i).  On  the  other 

tr 

hand  v  =  V  (w*,  t)  for  the  minimizing  value  w*,  which  is  actually  v  ((). 
Hence  the  equality  in  (8)  holds. 

We  thus  have  an  explicit  representation  for  the  solution  of  the  Riccati 
equation  in  terms  of  quadratures  and  a  minimization. 

§  9.  Approximation  in  policy  space 

As  we  have  discussed  in  the  preceding  chapters,  there  are  two  types  of 
successive  approximations  in  the  theory  of  dynamic  programming,  one 
based  upon  approximation  to  the  functions  which  satisfy  the  functional 
equation,  and  the  other  based  upon  approximation  to  the  policies  which 


326 


MARKOVIAN  DECISION  PROCESSES 


yield  these  solutions.  We  have  applied  the  traditional  method  above  in 
§  3.  Let  us  now  discuss  the  second  method. 

Consider  the  scalar  equation 

(1)  du/dt  =  Max  [6  (q  t)  -f  a  ( q ,  t)  m],  u  (0)  =  r  , 

r 

where  we  impose  the  restrictions  |  a  ( q ,  t)  j,  |6  ( q ,  t)  |  <;/(/),  °  f  (t)  dt  <  oo. 

We  begin  bj'  guessing  an  initial  policy  function  q0  =  q0  ( t ),  and  determi¬ 
ning  n0  by  means  of  the  equation 

(2)  duoldt  =  b  ( qa ,  t)  -f  a  (q0,  t)  u0,  u0  (0 )  —  c  . 

Next  determine  qx  by  the  condition  that  it  maximize  the  function 
b  (q,  t)  +  a  (q,  t)  u0,  and  compute  w,  as  the  solution  of 

(« i )  dui/dt  =  b  (qt,  t)  +  a  (qu  t)  ult  «,  (0)  =  c  . 

Continuing  is  this  way  we  determine  a  sequence  of  functions  {«„}  and  a 
sequence  of  policies  {^n}.  It  remains  to  show  that  this  sequence  {«,,}  ac¬ 
tually  converges. 

We  have 

(4)  dujdt  —  b  (qt,  t)  +  a  (qt,  t)  w„  «,  (0)  =  c  , 
duo/dt  —  b  ( qa ,  t)  +  a  (q„,  i)  u„ 

<>  b  (< qlt  t)  +  a  (q1,  t)  u0,  u0  (0 )  —  c, 

referring  to  the  definition  of  qt. 

The  solution  of 

(5)  dv/dt  —  g  (/)  v  +  h  (t),  v  (0)  —  c  , 


has  the  form 


which  we  may  write  as  L  ( h ),  an  operator  on  the  function  h. 

Since  e q,  it  follows  that 

(7)  L  (hi)  ;>  L  (ht) 
if  /t,  (()  ;>  /«,  (<)  for  t  >  0.  Hence 

(8)  u0<,  Mt.  for  0  <  t  <  7’ 

Proceeding  in  the  same  fashion,  we  see  inductively  that  un  <,  u„  +i  for 
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n  =  0,  1,  2,  . . .  Since  each  member  of  the  sequence  {«»}  is  uniformly 

rT  (T 

bounded  by  (c  +  f  f(s)ds)co  ,  it  follows  that  the  sequence 

{un  (f)}  converges  to  a  function  u  ( t ).  This  limit  function  satisfies  the  inte¬ 
gral  equation 

(9)  u  (t)  —  c  - f  f  Max  [b  (q,  s)  -f-  a  (q,  s)  «]  ds  , 

JO  f 

and  hence  the  differential  equation  almost  everywhere.  We  see  then  that 
approximation  in  policy  space  leads  to  convergent  sequences  in  the  one¬ 
dimensional  case. 

Let  us  turn  now  to  the  corresponding  question  for  systems  of  the  form 

(10)  dxjdt  —  Max  [ b  (q,  t)  -f  A  (q,  t)  x],  x  (0)  =  c 

v 

Using  the  same  procedure  as  above,  it  is  easy  to  see  that  the  problem 
reduces  to  determining  conditions  upon  the  matrix  A  (q,  /)  which  will 
ensure  that  f  (t)  2>  0  for  t  ;>  0  ensures  that  y  )>  0  for  t  )>  0,  where  y  is 
the  solution  of 

(11)  dy/dt  =  A  (q,  t)  y  +f(t),  y  (0)  =  0 
Since  the  solution  of  (11)  is  given  by 

(12)  w  =  £  Y(t)Y-'(s)f(s)ds, 

where  Y  (t)  is  the  matrix  solution  of 

(13)  dYjdt  =  A  (q,  t)  Y,  Y  (0)  =  /  , 
we  see  that  a  necessary  and  sufficient  condition  is 

(14)  Y{t)  y-'(s)  ^Oforf  ^s^O. 
and  uniformly  for  q  e  5. 

Since  this  is  a  difficult  condition  to  verily,  we  shall  content  ourselves 
with  the  remark  that  an  (q,  t)  )>  0,  i  j,  is  a  sufficient  condition. 

If  then  the  condition  a ()  (q,  t)  ;>  0,  i  ^  /,  is  satisfied  for  t  ;>  0  and  all 
q  t  S,  we  have  the  desired  convergence  in  policy  space. 

§  10.  Discrete  versions 

In  this  section  we  wish  to  ascertain  the  asymptotic  behavior  of  the 
sequence  {*<  («)},  t  =  1,2,  . . .,  N,  determined  by  the  recurrence  rela¬ 
tions 

,v 

(1)  x(  (m+1)  =  Max  27  a (J  (q)  xt  («),  i  —  1,2 . N,  n  >  0 

9  i  -  i 
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under  certain  assumptions  concerning  the  initial  values  ci  =  xt  (0)  and  the 
coefficient  matrix  A  (q). 

We  shall  begin  by  considering  the  homogeneous  equation 

(2)  Ay<  =-  Max  2’  a,j  (q)  y,,i  =  1  2,  ...,N , 

9  i  -  i  « 

where  we  impose  the  following  conditions 

(3)  (a)  q  —  (qu  . . .  qN)  runs  over  a  set  S  with  the  property  that  the 

maximum  is  attained  in  (1)  for  any  set  of  parameters 

(yi.  . . >'n). 

(b)  0  <  at)  (q)  <;  tn  <  oo  for  q  e  S  and  i,  j  =  1,  2,  ....  N. 

(c)  for  any  q,  let  tp  (q)  denote  the  characteristic  root  of  A  (q)  = 
(an  (q))  of  largest  absolute  value,  the  Perron  root.  It  is  assumed 
that  <p(q)  assumes  its  maximum  for  q  e  S. 

Let  us  now  prove 

Theorem  2.  Under  these  assumptions,  there  exists  a  unique  positive  con¬ 
stant  A  with  the  property  that  the  homogeneous  system  in  (2)  has  a  positive 
solution  yt  >  0,  i  =  1,  2,  This  solution  is  unique  up  to  a  multi¬ 

plicative  constant,  and 

(4)  A  =  Max  q,  (q) 

qt  s 

Proof.  We  shall  begin  by  establishing  the  existence  of  a  positive  A  and  a 
positive  solution  {y<}.  The  simplest,  though  least  elementary,  method 
employs  the  Brouwer  tixedpoint  theorem.  Consider  the  region  defined  by 

(5)  V<  ^  0,  i  yt  =  1 

i  -  1 

The  normalized  transformation  i 

(6)  yi  =  [Max  2’  at,  (q)  y,]  /  2  Max  [  2’  a,,  (q)  y,], 

9  )  1  •  “  1  9  )  '  1 

is  a  continuous  mapping  of  this  region  into  itself.  It  follows  that  there 
exists  a  fixed  point  {>'#},  constituting  the  required  positive  solution  since 
at,  (q)  >  0.  The  parameter  A  is  the  denominator  in  (6). 

To  show  that  this  solution  is  unique  up  to  multiplicative  constant,  let 
\ji,  z\  be  another  solution  of  (2)  with  ju  >  0  and  z  a  positive  vector.  Let 
{^}  be  a  set  of  values  for  which  the  maximum  is  attained  in  (2)  and  {q}  a 
similar  set  associated  with  2.  We  have 
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(7)  Ay<  =  E  a„  (y)  y}  ^  2  a„  (y)  y,.  i  =  1,2 . AT , 

i 

fiz(  =  2  (?) 

i 

Assume,  without  loss  of  generality  that  A  <  /a,  and  thus  that  y  and  z 
are  non-proportional  vectors.  If  y  and  z  are  proportional,  then  y  =  z. 
Let  e  be  a  positive  constant  chosen  so  that  one,  at  least,  of  the  compo¬ 
nents  y<  —  e  Zi  is  zero,  one  at  least  is  positiv and  the  others  are  non-nega¬ 
tive.  If  *  is  an  index  for  which  y<  —  e  z<  is  zero,  we  have 

y 

(8)  0  =  /z  (y<  —  ezi)  >  Ay<  —  e/zzt  ;>  2  (q)  (y,  —  e  zf)  >  0, 

i  -  i 

since  <*<>•(?)  >  0,  a  contradiction.  Hence  y  and  z  are  proportional,  which 
means  that  X  —  /z. 

To  show  that  A  =  Max  q>  (q),  we  proceed  as  follpws.  Let  fi  —  Max  cp  (?). 

9  9 

It  is  clear  that  A,  as  the  characteristic  root  of  some  A  (y),  satisfies  the  in¬ 
equality  X<,/z.  Assume  for  the  moment  that  A  <  pi.  Let  z  —  (z,,  zt, . . .  ,zn) 
be  a  positive  characteristic  vector  associated  with  /z  and  y  a  set  of  y- values 
which  yield  fi  —  (?)•  T'  “n  we  have 

y  y 

(9)  (z  z(  —  E  ai}  (q)  z,  <,  Max  E  an  (y)  z, 

i  -  1  q  i  -  l 

Since  each  yj  is  positive,  we  can  find  a  positive  constant  m  such  that 

(10)  zt  <,  myt,  i  =  1,  2 . IV. 

Then  (9)  yields 

y 

(11)  f*  zt<.  Max  (  2  ao  (?)  y>)  m  —  m  A  y« 

9  i  “  i 

Thus,  instead  of  (10)  we  obtain  the  result  Z(  <;  myi  A/yt<-  Iterating  this,  we 
obtain  z*  <.  myt(XI/z)k  for  arbitrary  k.  Since  Xj/z  <  1,  by  assumption, 
this  yields  z(  =  0,  a  contradiction.  Hence  A  =  /z. 

§ 11.  The  recurrence  relation 

Returning  to  the  recurrence  relation  of  (10.1),  let  us  prove 
Theorem  3.  If,  in  addition  to  the  conditions  of  (10.3),  we  assume  that  there 
is  a  unique  q  for  which  the  maximum  value  of  q  is  assumed,  and  that  ct  ;>  0, 
then 

(1)  xt  (n)  ~  ayt  A"  , 

as  n  — >  oo,  where  a  —  a  (c,,  c . .  cn)  . 
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Proof.  Let  us  take  ct  >  0,  without  loss  of  generality.  There  are  then  two 
positive  constants  k  and  K  such  that  kyt  Kyi  for  *  =  1,2.  . . N. 

Let  us  show  inductively  that 

(2)  kyt  A  "  Xi  (n)  <,  Kyt  A  n 

Assume  that  we  have  the  result  for  n,  then 

y 

(3)  xt  { n  -f  1)  <;  K  A"  Max  Z  at)  ( q )  yj  =  K  A"  +  1  y< 

t  1  -  i 

.v 

;>  A  A"  Max  (9}.  yj  =  A  A"  +  1  y< 

?  /  -  1 

To  establish  the  asymptotic  behavior  we  show  that  for  sufficiently 
large  «,  the  set  of  q’s  which  furnish  the  maximum  in  (10.1)  is  precisely 
the  set  which  yields  Max  q>  (q). 

1 

Assume  the  contrary.  This  means  that  infinitely  often  in  the  recurrence 
relation  of  (10.1)  we  will  employ  a  set  {9}  which  is  not  identical  with  the 
set,  {q},  which  furnishes  the  maximum  of  93  ( q ). 

We  then  have 

(4)  Xi  (n  +  1)  =  Z  atl  (q)  x ,  (n),  i  —  1,2,  . .  N  , 

i  -  1 

<.(  £  at)  (?)  Vi)  K  A" 

»  -  1 

For  some  index  i  we  must  have 

I  -v 

(5)  Z  an  ( q )  y,  <  Ay< , 

-  1 

i 

j  -v  - 

>vit.h  strict  inequality.  For  if  Z  an  (q)  yj  ;>  A  y<  for  all  i,  the  characteristic 

i  -  1 

root  of  A  (q)  =  (aij  (q))  of  largest  absolute  value  would  at  least  equal  A 
=  Max  qp  (q),  contradicting  the  assumption  concerning  the  uniqueness  of 
1 

the  maximum  of  qp  (q). 

Hence  for  some  component,  say  the  first,  we  have 

(6)  X!  (n  +  1)  ^  0  K  i.»  +1  y„  0  <  0  <  1 

Since  an  ( q *)  >  0  for  all  i,  j,  where  q *  is  the  value  of  q  for  which  A  = 
9 0  (9*),  we  see  that,  for  i  -  1,  2,  . . .,  N, 
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(7)  x(  («  +  2)  <.  K  kn  +  1  [  E  a()  {q*)  yf  +  0  at,  ( q *)  yj  , 

>  -  z 

<:  6,  K  kn  +  2  y< , 

where  0X  <  1. 

Consequently,  if  a  set  of  9’s  distinct  from  are  used  R  times,  we  obtain 

(8)  xt  («)  ^  0t*  K  kn  y(, 

for  n  large.  Since  0  <  0,  <  1,  we  eventually  contradict  the  lower  bound 
for  xt  ( n )  if  R  is  large.  Hence  a  set  of  ^'s  distinct  from  q *  can  only  be  used 
a  bounded  number  of  times,  with  the  bound  determined  by  k  and  K. 

§  12.  Min-max 

The  same  method  we  employed  to  demonstrate  Theorem  1  establishes 
the  following  result 

Theorem  4.  Consider  the  equation 

(1)  dxjdt  —  Max  Min  [A  {p,  q,t)x-\-b  (p,  q,  I)] 

v  9 

=  Min  Max  [  ...  ],  x  (0)  —  c  , 

9  v 

where  we  assume  that 


(2)  (a)  For  fixed  values  of  x  and  t,  the  max-tnin  in  (1)  is  equal  to  the  min- 

max,  where  p  and  q  range  over  some  set  of  admissible  vectors  S. 

(b)  Max  !  |  A  (p,  q,  t)  |  |,  Max  |  1  b  (p,  q,  t)  |  |  <,  f  (t)  for  t  >  0,  where 

.s 


f  (t)  dl  <.  00. 


Then  there  is  a  unique  solution  to(l)  in0<,t  <^T  which  satisfies  the  equation 
almost  everywhere,  and  may  be  found  as  the  limit  of  the  following  sequence 

(3)  (a)  Xo  =  c  , 


(b) 


X  n  +  1  —  C  -f- 


/: 


Max  Min  [A  (p,  q,  s)  xn  -f  b  Ip,  q,  s)]  ds 

T  9 


—  C 


Min  Max  [A  (p,  q,  s)  xn  +  b  (p,  q,  s)]  ds 
9  P 


§  13.  Generalization  of  a  von  Neumann  result 

In  the  chapter  devoted  to  rirulti-stage  games,  we  established  the  result 
that 
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(1) 


(Ap,  q ) 


Max  Min 
p  <,  (BP.  9) 


Min  Max 

9  P 


(Ap,  q) 
(Bp,  q)  ’ 


where  A  and  B  are  matrices,  and  p  and  q  are  probability  vectors,  provided 
that  (Bp,  q)  {>  d  >  0  for  all  p  and  q. 

Let  us  now  obtain  the  following  generalization 

Theorem  5.  Consider  the  scalar  equation 

(2)  dujdt  =  Max  Min  [(Ap,  q)  —  (Bp,  q)  «],  «  (0)  =  c, 

p  i 

=  Min  Max  {(Ap,  q)  —  (Bp,  q)  u] . 

t  p 

If  (Bp,  q)  {>  d  >  0  for  all  probability  vectors  p  and  q,  we  have 

(Ap,  q) 

(3)  lim  u  (t)  =  Max  Min  ) 


—  Min  Max 

v  p 


(Ap,  q) 
(Bp,  q) 


Proof.  The  classical  min-i.'.ax  thco.cin  oi  vnn  Neumann  guarantees  the 
equality  of  max-min  and  min-max  oi  (Ap,  q)  —  (Bp,  q)  u  for  each  u.  The 
other  conditions  of  Theorem  4  are  satisfied  and  ensure  the  existence  and 
uniqueness  of  u  (t). 

To  obtain  the  asymptotic  behavior,  consider  first  the  scalar  equation 

(4)  duldt  =  a  —  bu,  u  (0)  =  c, 

where  a  and  b  are  constants  and  where  ’>  >  0.  It  is  easy  to  see  that  the 
solution  is  bounded  as  t  ->  oo,  and  we  can  show  that  lim  u  ( t )  =  a/b  by 

t  *  oo 

means  of  the  following  simple  argument.  Whenever  dujdt  =  0,  wo  must 
have  w  =  ajb.  Hence  n  ( t )  can  haw  at  most  one  turning  point  for 
t  >  0,  and  thus  is  ultimately  monotone.  Since  u  ( t )  is  bounded,  it  ap¬ 
proaches  a  finite  limit  which  must  be  ajb. 

Consider  the  nonlinear  equation 


(5)  duUit  —  Max  [a  (p)  —  b  (p)  m],  u  (0)  =  c  , 

v 

where  h  (p)  >  b  >  0  for  all  p,  \  a  (p)  |  <;  M  for  all  p,  and  a  (p)  and  b  (p) 
are  such  that  the  maximum  is  assumed.  At  any  turning  point  of  u,  we 
must  have 

(6)  «  =  Max  a  (p)  I  b  (p)  . 

p 
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Consequently,  u(t )  must  be  ultimately  monotone  and  approach  the  finite 
limit  given  in  (6). 

We  see  that  precisely  the  same  argument  works  for  the  equation  of  (2). 
At  a  turning  point  we  must  have 


(7) 


u  =  Max  Min 

V  9 


(AP,  q) 
{BP,  q) 


{Ap,  q) 

Min  Max  — — - — 

?  P  {Bp,  q) 


Exercises  and  Research  Problems  for  Chapter  XI 

1.  A  merchant  has  n  identical  items  and  a  length  of  time  t  to  dispose  of 
these.  Goods  may  be  sold  at  times  0,  1,  2,  ...,<,  and  the  probability  of 
selling  an  item  depends  upon  its  price.  Let  q>  (z)  be  the  probability  that  an 
item  of  price  z  is  sold  when  displayed  at  a  particular  time. 

Define  fn  ( t )  to  be  the  maximum  expected  return  from  n  items  over  a 
maximum  sale  period  of  t.  Assuming  independence  of  sales,  obtain  the 
recurrence  relation 

fn  (t)  =  Max  [  y  (2)  <p  (z)*  (1  —  <p  (z))»  -  *  [/■  -  *  {t  —  1)  +  kz}} , 

.  i  >  «  A-  =  0 

with  fn  (0)  =  0.  (Darling) 

2.  Assume  that  the  items  are  on  sale  continuously,  and  that  qp  (z)  dt  re¬ 
presents  the  probability  that  an  item  of  price  z  will  be  sold  in  a  time-inter¬ 
val  {l,  t  +  dt).  Show  that  the  limiting  form  of  the  above  recurrence  rela¬ 
tion  is 

fs  (t)  =  Max  [  -  Ncp  (z)/v  ( t )  +  N<p  (z)fN  .  ,  (/)  +  Nrp  (z)] 

z  >  o 

}n  (0)  =  0, 

N  1  Jo{t)  -  0. 

3.  Consider  the  case  AT  =  I  in  Problem  4.  Show  that  if  we  solve  the 
equation 

F'  (l)  =  —cp  (z)  F  (t)  +  cp  (z),  F  (0)  -  0, 

obtaining 

F  (t)  =  F  (l,  z)  =  j  e~^’q  *  )  “i qp  (z)  ds  , 
then /,  ( t )  =  Max  F  ( t ,  z) . 

;  >  (I 

4.  Show  that  the  equation 

fx'  (0  =  Max  [—(f  (z) /,  (l)  +  z<p  (z)],/,  (0)  =  O 

l  >  o 
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is  equivalent  to  the  two  equations 

ft  (0  =  —  9  (2)  A  (0  +  z<P  (2),  A  (0)  =  0 
0  =  —  (p'  {z)A  (0  -f  <p  (2)  +  2  <p'  (2) . 

5.  Consider  in  detail  the  particular  cases  where 

a.  <p  (z)  =  b  e~bl 

b.  <p  (z)  =  (k  —  z)/k,  0  <,  z  <,  k 

=  0,  z  k 

6.  Obtain  the  solution  of  the  equations  in  Exercise  2  for  general  AT. 

7.  Consider  the  similar  situation  in  which  we  have  the  same  item  in  two 
price  ranges.  How  do  we  set  the  prices  ? 

8.  Consider  the  process  in  Problem  1  in  which  we  reduce  the  price  per 
item  for  multiple  orders.  How  should  this  be  done  to  maximize  expected 
profit  ? 

9.  Establish  existence  and  uniqueness  theorems  for  integral  equations  of 
the  form 

u  (t)  =  Max  [a  ( q ,  t)  -)-  f  K  ( q ,  t,  s)  u  (s)  ds\  , 

q  JO 

under  appropriate  assumptions. 

10.  Obtain  results  corresponding  to  those  in  §  8  for  the  equations 

u'  =  u*  -f  p  (0  u  -f  q  (/)  , 

for  k  >  1  and  0  <  k  <  1. 

1 1 .  Consider  the  general  case  where 

u'  =  g  («,  t) ,  ' 

and  g  is  either  strictly  convex  in  u  for  all  t,  or  strictly  concave. 

12.  Consider  the  Riccati  equation 

du 

—  =-  u 2  -(-  a  (t),  u  (0)  =  c, 
d  t 

and  the  sequence  of  successive  approximations  defined  by 

=  2u0vo  —  no 2  +  a  (t),  u0  (0)  =  c, 
dt 

<illnj  —  =  2 Un  +  1  U„ - tin2  +  a  {(),  U„  +  1  (0)  =  c, 

dt 

where  Vo  ( t )  is  an  arbitrary  continuous  function. 
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Show  that  tin  m  t it >< i  i  |inv  .ill  nt  In  ■  •  • 

policy  space,  aim  *ha  ii«  in  „ 

of  dctinition. 


'  ,i  m 
in 


1 1 M  a  1 1  s  1 1* 1 
I  ■  III!  I  I*  ’  '  ' 


Ml 

III' 


13.  Similarly,  cm  suit 


du»  i  i 
dt 


f  y'«.  I) 


(l  N  I 


in  connection  with  ih  eijnalun  l 


f 


II  00 


14.  What  is  the  com  <  t  mi  I  um  tins  mitlmil  ol  niinitti  ■ 
proximations  and  Newton's  me  liml  i  .nlviiig  <*•  |iia I  n him  i* 


1 5.  What  is  the  connection  betw  vn  1 1  ■  1 1{ it < i x it n  it  ion  si  In  n  i 

above  and  the  concept  of  appiovim  vi  in  poln  v  sp..ii  } 
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